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SOLUBLE PEPTIDES HAVING 
CONSTRAINED, SECONDARY CONFORMATION IN 
SOLUTION AND METHOD OF MAKING SAME 



BACKGROUND OF THE INVENTION 

5 The biological function of a peptide depends upon 

its direct, physical interaction with another molecule • 
The peptide or protein is termed the ligand. 

Peptides are distinguishable by their specificity 
for certain ligand-binding proteins. The specificity of 

10 binding, i.e., the discrimination between closely related 
ligands, is determined by a peptide's binding affinity. 
Peptides having useful binding properties are invaluable 
for chemotherapy and drug design. Therefore, a need exists 
for the generation of peptides having biologically useful 

15 binding affinities and being soluble in solution. 

Secondary structure of a peptide is critical for 
determining its binding affinity. For example, a highly 
flexible peptide is able to interact with many distinct 
molecules; however, the peptide-ligand interaction is 
20 easily disrupted, or in other words, the binding affinity 
of the peptide is low. Thus, a peptide having a specific 
secondary structure is able to bind tightly with only a few 
or one ligand. 

However, if secondary structure of the ligand 
25 results from non-covalent interactions, the peptide 
inevitably is insoluble. Intra-peptide covalent bonds can 
solve this problem resulting in constrained peptides, i.e., 
peptides having a stable secondary structure in a solution, 
that are soluble. 

30 This invention provides a method to synthesize 

soluble peptides having constrained, secondary conformation 
in solution, as w 11 as th p ptid s produced by this 
method. 
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This inv ntion also relates g nerally to methods 
for synth sizing and expressing oligonucleotides and, more 
particularly, to methods for expressing oligonucleotid s 
having biased, but random codon sequences* 

5 Oligonucleotide synthesis proceeds via linear 

coupling of individual monomers in a stepwise reaction. 
The reactions are generally performed on a solid phase 
support by first coupling the 3' end of the first monomer 
to the support. The second monomer is added to the 5' end 

10 of the first monomer in a condensation reaction to yield a 
dinucleotide coupled to the solid support. At the end of 
each coupling reaction, the by-products and unreacted, free 
monomers are washed away so that the starting material for 
the next round of synthesis is the pure oligonucleotide 

15 attached to the support. In this reaction scheme, the 
stepwise addition of individual monomers to a single, 
growing end of a oligonucleotide ensures accurate synthesis 
of the desired sequence. Moreover, unwanted side reactions 
are eliminated, such as the condensation of two 

20 oligonucleotides, resulting in high product yields. 

In some instances, it is desired that synthetic 
oligonucleotides have random nucleotide sequences. This 
result can be acconplished by adding equal proportions of 
all four nucleotides in the monomer coupling reactions, 

25 leading to the random incorporation of all nucleotides and 
yielding a population of oligonucleotides with random 
sequences. Since all possible combinations of nucleotide 
sequences are represented within the population, all 
possible codon triplets will also be represented. If the 

30 objective is ultimately to generate random peptide 
products, this approach has a severe limitation because the 
random codons synthesized will bias the amino acids 
incorporated during translation of the DNA by the cell into 
polypeptid s. 
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The bias is du to th redundancy of the gen tic 
code. There are four nucl otide monom rs which leads to 
sixty- four possibl triplet codons. With only twenty amino 
acids to specify, many of the amino acids are encoded by 
5 multiple codons. Therefore, a population of 

oligonucleotides synthesized by sequential addition of 
monomers from a random population will not encode peptides 
whose amino acid sequence represents all possible 
combinations of the twenty different amino acids in equal 
10 proportions. That is, the frequency of amino acids 
incorporated into polypeptides will be biased toward those 
amino acids which are specified by multiple codons. 

To alleviate amino acid bias due to the 
redundancy of the genetic code, the oligonucleotides can be 

15 synthesized from nucleotide triplets. Here, a triplet 
coding for each of the twenty amino acids is synthesized 
from individual monomers. Once synthesized, the triplets 
are used in the coupling reactions instead of individual 
monomers. By mixing equal proportions of the triplets, 

20 synthesis of oligonucleotides with random codons can be 
accomplished. However, this is not possible because of the 
inefficiency of the coupling, which is less than 3% and the 
high cost of synthesis. 

Amino acid bias can be reduced, however, by 
25 synthesizing the degenerate codon sequence NNK where N is 
a mixture of all four nucleotides and K is a mixture 
guanine and thymine nucleotides. Each position within an 
oligonucleotide having this codon sequence will contain a 
total of 32 codons (12 encoding amino acids being 
30 represented oncer 5 represented twice, 3 represented three 
times and one codon being a stop codon). Oligonucleotides 

» 

expressed with such degenerate codon sequences will produce 
peptid products whose sequences are biased toward those 
amino acids being r present d m r than once. Thus, 
35 populations of p ptid s whose sequ nces are compl tely 
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random cannot b obtained from oligonucleotides synth siz d 
from d generat sequences. 

There thus exists a need for a method to express 
oligonucleotides having a fully random or desiraJaly biased 
5 sequence which alleviates genetic redundancy. The present 
invention satisfies these needs and provides additional 
advantages as well. 

SUMMARY OF THE INVENTION 

This invention provides a peptide having 
10 constrained^ secondary structure in solution as well as 
methods of synthesizing these peptides. 

The invention provides a plurality of procaryotic 
cells containing a diverse population of expressible 
oligonucleotides encoding soluble peptides having 
15 constrained secondary structure or conformation in 
solution, the expressible oligonucleotide being 
operationally linked to expression elements, the 
expressible oligonucleotides further characterized as 
having a desirable bias of random codon sequences. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic drawing for synthesizing 
oligonucleotides from nucleotide monomers with random 
tuplets at each position using twenty reaction vessels. 

Figure 2 is a schematic drawing for synthesizing 
25 oligonucleotides from nucleotide monomers with random 
tuplets at each position using ten reaction vessels. 

Figure 3 is a schematic diagram of the two 
V ctors used for sublibrary and library production from 
pr cursor oligonucleotide portions. Mi3lX22 (Figure 3A) is 
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the V ctor used to clone the anti-sense precursor portions 
(hatched box) . The singl -h ad d arrow represents the Lac 
p/o expr ssion segu nc s and the double-headed arrow 
represents the portion of M13IX22 which is to be combined 
5 with M13IX42. The amber stop codon for biological 
selection and relevant restriction sites are also shown. 
M13IX42 (Figure 3B) is the vector used to clone the sense 
precursor portions (open box). Thick lines represent the 
pseudo-wild type (^gVIII) and wild type (gVIII) gene VIII 

10 sequences. The double-headed arrow represents the portion 
of M13IX42 which is to be combined with M13IX22. The two 
amber stop codons and relevant restriction sites are also 
shown. Figure 3C shows the joining of vector population 
from sublibraries to form the functional surface expression 

15 vector M13IX. Figure 3D shows the generation of a surface 
expression library in a non-suppressor strain and the 
production of phage. The phage are used to infect a 
suppressor strain (Figure 3E) for surface expression and 
screening of the library. 

20 Figure 4 is a schematic diagram of the vector 

used for generation of surface expression libretries from 
random oligonucleotide populations (H13IX30). The symbols 
are as described for Figure 3. 

Figure 5 is the nucleotide sequence of M13IX42 
25 (SEQ ID NO: 1). 

Figure 6 is the nucleotide sequence of H13IX22 
(SEQ ID NO: 2) . 

Figure 7 is the nucleotide sequence of Ml 3 1X30 
(SEQ ID NO: 3) • 

30 Figur 8 is th nucleotide sequence of M13ED03 

(SEQ ID NO: 4) . 
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Figure 9 is the nucleotid sequence of M13IX421 
(SEQ ID NO: 5) . 

Figure 10 is the nucleotide sequence of M13ED04 
(SEQ ID NO: 6) . 

5 DETAILED DESCRIPTION OF THE INVENTION 

This invention is directed to a simple and 
inexpensive method for synthesizing and expressing 
oligonucleotides having a desirable bias of random codons 
using individual monomers. The oligonucleotides produced 
by this method encode soluble peptides having constrained 
secondary structure in solution. The method is 
advantageous in that individual monomers are used instead 
of triplets and by synthesizing only a non-degenerate 
subset of all triplets , codon redundancy is alleviated • 
Thus, the oligonucleotides synthesized represent a large 
proportion of possible random triplet sequences which can 
be obtained. The oligonucleotides can be expressed , for 
example, on the surface of filamentous bacteriophage in a 
form which does not alter phage vied^ility or impose 
biological selections against certain peptide sequences. 
The oligonucleotides produced are therefore useful for 
generating an unlimited number of pharmacological and 
research products. 

This invention entails the sequential coupling of 
25 monomers to produce oligonucleotides with a desirable bias 
of random codons. The coupling reactions for the 
randomization of twenty codons which specify the amino 
acids of the genetic code are performed in ten different 
reaction vessels. Each reaction vessel contains a support 
30 on which the monomers for two different codons are coupled 
in three s quential r actions. One of th r actions 
couples an equal mixture of two monomers such that th 
final product has two different codon s quences. The 
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codons are randomized by removing th supports from th 
reaction vessels and mixing them to produce a singl batch 
of supports containing all twenty codons at a particular 
position. Synthesis at the next codon position proceeds by 
equally dividing the mixed batch of supports into ten 
reaction vessels as before and sequentially coupling the 
monomers for each pair of codons. The supports are again 
mixed to randomize the codons at the position just 
synthesized. The cycle of coupling, mixing and dividing 
continues until the desired number of codon positions have 
been randomized. After the last position has been 
randomized, the oligonucleotides with random codons are 
cleaved from the support. The remdom oligonucleotides can 
then be expressed, for example, on the surface of 
filamentous bacteriophage as gene Vlll-peptide fusion 
proteins. Alternative genes can be used as well. Using 
this method, one can randomize oligonucleotides at certain 
positions and select for specific oligonucleotides at 
others . 

20 This invention provides a diverse population of 

synthetic biased oligonucleotides contained in vectors so 
as to be expressible in cells. In the preferred embodiment 
of this invention, the oligonucleotides are fully defined 
in that at least two codons encode amino acids capable of 

25 forming a covalent bond. The populations of 

oligonucleotides can be expressed as fusion products in 
combination with surface proteins of filamentous 
bacteriophage, such as M13, as with gene VIII. The vectors 
can be transfected into a plurality of cells, such as the 

30 procaryote E. coli . 

In one embodiment, the diverse population of 
oligonucleotides can be formed by randomly combining first 
and second pr cursor populations, each or eith r precursor 
population having a desirable bias of random codon 
35 8 quences. Methods of synth sizing and expressing the 
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diverse population of expressible oligonucleotides are also 
provided. 

Two precursor populations of random precursor 
oligonucleotides are synthesized in one embodiment. The 
5 oligonucleotides within each population encode a portion of 
the final oligonucleotide that is expressed. 
Oligonucleotides within one precursor population encode the 
carboxy terminal portion of the expressed oligonucleotides. 
In one embodiment, these oligonucleotides are cloned in 

10 frame with a gene VIII (gVIII) sequence so that translation 
of the sequence produces peptide fusion proteins. The 
second population of precursor oligonucleotides are cloned 
into a separate vector. Each precursor oligonucleotide 
within this population encodes the anti-sense of the amino 

15 terminal portion of the expressed oligonucleotides. This 
vector also contains the elements necessary for expression. 
The two vectors containing the random oligonucleotides are 
combined such that the two precursor oligonucleotide 
portions are joined together at random to form a population 

20 of larger oligonucleotides derived from two smaller 
portions. The vectors contain selectable markers to ensure 
maximum efficiency in joining together the two 
oligonucleotide populations. A mecheuiism also exists to 
control the expression of gVIII-peptide fusion proteins 

25 during library construction and screening. 

As used herein, the term "monomer" or "nucleotide 
monomer" refers to individual nucleotides used in the 
chemical synthesis of oligonucleotides. Monomers that can 
be used include both the ribo-* and deoxyribo*- forms of each 

30 of the five standard nucleotides (derived from the bases 
adenine (A or dA, respectively), guanine (6 or dG), 
cytosine (C or dC), thymine (T) and uracil (U)). 
D rivatives and precursors of bases such as inosin which 
cir capable of supporting polyp ptide biosynthesis ar also 

35 included as monomers. Also included are chemically 
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modified nucleotides, for example, one having a r v rsible 
blocking agent attached to any of th positions on the 
purin or pyrimidin has s, the ribos or d oxyribose sugar 
or the phosphate or hydroxyl moieties of the monomer. Such 
5 blocking groups include, for example, dimethoxytrityl, 
benzoyl, isobutyryl, beta-cyanoethyl and diisopropylamine 
groups, and are used to protect hydroxyls, exocyclic amines 
and phosphate moieties. Other blocking agents can also be 
used and axe known to one skilled in the art. 

10 As used herein, the term "tuplet" refers to a 

group of elements of a definable size. The elements of a 
tuplet as used herein are nucleotide monomers. For 
example, a tuplet can be a dinucleotide, a trinucleotide or 
can also be four or more nucleotides. 

15 As used herein, the term "codon" or "triplet" 

refers to a tuplet consisting of three adjacent nucleotide 
monomers which specify one of the twenty naturally 
occurring amino acids found in polypeptide biosynthesis. 
The term also includes nonsense, or stop, codons which do 

20 not specify any amino acid. 

"Random codons" or "randomized codons," as used 
herein, refers to more than one codon at a position within 
a collection of oligonucleotides. The number of different 
codons can be from two to twenty at any particular 

25 position. "Randomized oligonucleotides," as used herein, 
refers to a collection of oligonucleotides with random 
codons at one or more positions. "Random codon sequences" 
as used herein means that more than one codon position 
within a randomized oligonucleotide contains random codons. 

30 For example, if randomized oligonucleotides are six 
nucleotid s in 1 ngth (i.e., two codons) and both the first 
and second codon positions are randomized to encode all 
twenty amino acids, then a population of oligonucleotides 
having random codon sequences with every possible 
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combination of the twenty tripl ts in the first and second 
position makes up the above population of randomized 
oligonucleotid s. The numb r of possible codon 

combinations is 20^ Likewise, if randomized 

5 oligonucleotides of fifteen nucleotides in length are 
synthesized which have random codon sequences at all 
positions encoding all twenty amino acids, then all 
triplets coding for each of the twenty amino acids will be 
found in equal proportions at every position • The 
10 population constituting the randomized oligonucleotides 
will contain 20" different possible species of 
oligonucleotides. "Random tuplets," or "randomized 
tuplets" are defined analogously. 

As used herein, the term "bias" refers to a 
15 preference. It is understood that there can be degrees of 
preference or bias toward codon sequences which encode 
particular amino acids. For excunple, an oligonucleotide 
whose codon sequences do not preferably encode pairticular 
amino acids is unbiased and therefore completely random. 
20 The oligonucleotide codon sequences can also be biased 
toward predetermined codon sequences or codon frequencies 
and while still diverse and random, will exhibit codon 
sequences biased toward a defined, or preferred, sequence. 
"A desirable bias of random codon sequences" as used 
25 herein, refers to the predetermined degree of bias which 
can be selected from totally random to essentially, but not 
totally, defined (or preferred). There must be at least 
one codon position which is variable, however. 

As used herein, the term "support" refers to a 
30 solid phase material for attaching monomers for chemical 
synthesis. Such support is usually composed of materials 
such as beads of control pore glass but can be other 
materials known to one skill d in the art. The term is 
also meant to includ one or more monom rs coupled to the 
35 support for additional oligonucleotide synthesis reactions. 
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As used herein, the terms "coupling" or 
"condensing" refers to the chemical reactions for attaching 
on monom r to a second monomer or to a solid support. 
Such reactions are known to one skilled in the art and are 
5 typically performed on an automated ONA synthesizer such as 
a MilliGen/Biosearch Cyclone Plus Synthesizer using 
procedures recommended by the manufacturer. "Sequentially 
coupling" as used herein, refers to the stepwise addition 
of monomers. 

10 The term "soluble peptide" means a peptide that 

is soluble at a concentration equivalent to its affinity to 
a receptor. The peptide can then be used in aqueous 
solution without being attached to a cell or phage. 

The term "constrained secondary structure in 
15 solution" means a peptide having a covalent bond that is 
not the backbone peptide bond. 

A method of synthesizing oligonucleotides having 
biased random tuplets using individual monomers is 
described. The method consists of several steps, the first 

20 being synthesis of a nucleotide tuplet for each tuplet to 
be randomized. As described here and below, a nucleotide 
triplet (i.e., a codon) will be used as a specific example 
of a tuplet. Any size tuplet will work using the methods 
disclosed herein, and one skilled in the art would know how 

25 to use the methods to randomize tuplets of any size. 

If the randomization of codons specifying all 
twenty amino acids is desired at a position, then twenty 
different codons are synthesized. Likewise, if 

randomization of only ten codons at a particular position 
30 is desired then those ten codons are synthesized. 
Randomization of codons from two to sixty-four can be 
accomplished by synth sizing ach d sired triplet. 
Preferably, randomization of from two to twenty codons is 
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used for any one position because of the redundancy of the 
genetic code. The codons selected at one position do not 
have to be the same codons s 1 cted at th n xt position. 
Additionally, the sense or anti-sense sequence 
5 oligonucleotide can be synthesized. The process therefore 
provides for randomization of any desired codon position 
with any number of codons. In addition, it also allows one 
to preselect a specified codon to be present at a 
particular position within a randomized sequence. 

10 Codons to be randomized are synthesized 

sequentially by coupling the first monomer of each codon to 
separate supports. The supports for the synthesis of each 
codon can, for example, be contained in different reaction 
vessels such that one reaction vessel corresponds to the 

15 monomer coupling reactions for one codon. As will be used 
here and below, if twenty codons are to be randomized, then 
twenty reaction vessels can be used in independent coupling 
reactions for the first twenty monomers of each codon. 
Synthesis proceeds by sequentially coupling the second 

20 monomer of each codon to the first monomer to produce a 
dimer, followed by coupling the third monomer for each 
codon to each of the above-synthesized dimers to produce a 
trimer (Figure 1, step 1, where Mj, and M3. represent the 
first, second and third monomer, respectively, for each 

25 codon to be randomized). 

Following synthesis of the first codons from 
individual monomers, the randomization is achieved by 
mixing the supports from all twenty reaction vessels which 
contain the individual codons to be randomized. The solid 

30 phase support can be removed from its vessel and mixed to 
achieve a random distribution of all codon species within 
the population (Figure 1, step 2). The mixed population of 
supports, constituting all codon species, are then 
redistributed into twenty independent reaction vessels 

35 (Figure 1, step 3). The resultant vessels are all 
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identical and contain gual portions ot all tw nty codons 
coupled to a solid phase support. 

For randomization of the second position codon, 
synthesis of twenty additional codons is performed in each 
5 of the twenty reaction vessels produced in step 3 as the 
condensing substrates of step 1 (Figure 1, step 4). Steps 

1 and 4 are therefore equivalent except that step 4 uses 
the supports produced by the previous synthesis cycle 
(steps 1 through 3) for codon synthesis whereas step 1 is 

10 the initial synthesis of the first codon in the 
oligonucleotide. The supports resulting from step 4 will 
each have two codons attached to them (i.e., a 
hexanucleotide ) with the codon at the first position being 
any one of twenty possible codons (i.e., random) and the 

15 codon at the second position being one of the twenty 
possible codons. 

For randomization of the codon at the second 
position and synthesis of the third position codon, steps 

2 through 4 are again repeated. This process yields in 
20 each vessel a three codon oligonucleotide (i*e., 9 

nucleotides) with codon positions 1 and 2 randomized and 
position three containing one of the twenty possible 
codons. Steps 2 through 4 are repeated to randomize the 
third position codon and synthesize the codon at the next 

25 position. The process is continued until an 

oligonucleotide of the desired length is achieved. After 
the final randomization step, the oligonucleotide can be 
cleaved from the supports and isolated by methods known to 
one skilled in the art. Alternatively, the 

30 oligonucleotides can remain on the supports for use in 
methods employing probe hybridization. 

The diversity of codon sequences, i.e., the 
number of diff rent possible oligonucleotides, that can be 
obtained using the methods of the pres nt invention, is 
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extremely large and only limited by the physical 
characteristics of available materials. For example, a 
support composed of beads of about 100 pm in diameter will 
be limited to about 10,000 beads /reaction vessel using a 1 
5 pM reaction vessel containing 25 mg of beads. This size 
bead can support about 1 x 10^ oligonucleotides per bead. 
Synthesis using separate reaction vessels for each of the 
twenty amino acids will produce beads in which all the 
oligonucleotides attached to an individual bead are 

10 identical. The diversity which can be obtained under these 
conditions is approximately 10^ copies of 10,000 x 20 or 
200,000 different random oligonucleotides. The diversity 
can be increased, however, in several ways without 
departing from the basic methods disclosed herein. For 

15 example, the number of possible seguences can be increased 
by decreasing the size of the individual beads which make 
up the support. A bead of about 30 im in diameter will 
increase the number of beads per reaction vessel and 
therefore the number of oligonucleotides synthesized. 

20 2^other way to increase the diversity of oligonucleotides 
with random codons is to increase the volume of the 
reaction vessel. For example, using the same size bead, a 
larger volume can contain a greater niunber of beads than a 
smaller vessel and therefore support the synthesis of a 

25 greater number of oligonucleotides. Increasing the number 
of codons coupled to a support in a single reaction vessel 
also increases the diversity of the random 
oligonucleotides. The total diversity will be the number 
of codons coupled per vessel raised to the number of codon 

30 positions synthesized. For example, using ten reaction 
vessels, each synthesizing two codons to randomize a total 
of twenty codons, the niunber of different oligonucleotides 
of ten codons in length per 100 pm bead can be increased 
where each bead will contain about 2" or 1 x 10^ different 

35 s guences instead of one. One skill d in the art will know 
how to modify such paramet rs to increase the diversity of 
oligonucleotides with random codons. 
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A method of synthesizing oligonucleotid s having 
random codons at each position using individual monomers 
wherein the number of r action vessels is less than the 
number of codons to be randomized is also described. For 
5 example, if twenty codons are to be randomized at each 
position within an oligonucleotide population, then ten 
reaction vessels can be used. The use of a smaller number 
of reaction vessels than the number of codons to be 
randomized at each position is preferred because the 
10 smaller number of reaction vessels is easier to manipulate 
and results in a greater number of possible 
oligonucleotides synthesized. 

The use of a smaller number of reaction vessels 
for random synthesis of twenty codons at a desired position 

15 within an oligonucleotide is similar to that described 
above using twenty reaction vessels except that each 
reaction vessel can contain the synthesis products of more 
than one codon. For example, step one synthesis using ten 
reaction vessels proceeds by coupling eibout two different 

20 codons on supports contained in each of ten reaction 
vessels. This is shown in Figure 2 where each of the two 
codons coupled to a different support can consist of the 
following sequences: (1) (T/G)TT for Phe and Val; (2) 
{T/C)CT for Ser and Pro; (3) (T/C)AT for Tyr emd His; (4) 

25 (T/C)GT for Cys and Arg; (5) (C/A)TG for Leu and Met; (6) 
(C/G)AG for Gin and Glu; (7) (A/G)CT for Thr and Ala; (8) 
(A/G)AT for Asn and Asp; (9) (T/G)GG for Trp and Gly and 
(10) A(T/A)A for lie and Cys. The slash (/) signifies that 
a mixture of the monomers indicated on each side of the 

30 slash are used as if they were a single monomer in the 
indicated coupling step* The antisense sequence for each 
of the above codons can be generated by synthesizing the 
complementary sequence. For example, the antisense for Phe 
and Val can be AA(C/A). Th amino acids ncod d by ach of 

35 the above pairs of s quenc s are given as th standard 
three lett r nom nclature. 
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Coupling of th monomers in this fashion will 
yi Id codons specifying all twenty of the naturally 
occurring amino acids attached to supports in ten reaction 
vessels. However, the number of individual reaction 
5 vessels to be used will depend on the number of codons to 
be randomized at the desired position and can be determined 
by one skilled in the art* For example, if ten codons are 
to be randomized, then five reaction vessels can be used 
for coupling. The codon sequences given above can be used 
10 for this synthesis as well. The sequences of the codons 
can also be changed to incorporate or be replaced by any of 
the additional forty-four codons which constitutes the 
genetic code. 

The remaining step s of sy n t h e s i s of 
15 oligonucleotides with random codons using a smaller number 
of reaction vessels are as outlined above for synthesis 
with twenty reaction vessels except that the mixing and 
dividing steps are performed with supports from about half 
the number of reaction vessels. These remaining steps are 
20 shown in Figure 2 (steps 2 through 4). 

Oligonucleotides having at least one specified 
tuplet at a predetermined position and the remaining 
positions having random tuplets are synthesized using the 
methods described herein. The synthesis steps are similar 

25 to those outlined above using twenty or less reaction 
vessels except that prior to synthesis of the specified 
codon position, the dividing of the supports into separate 
reaction vessels for synthesis of different codons is 
omitted. For example, if the codon at the second position 

30 of the oligonucleotide is to be specified, then following 
synthesis of random codons at the first position and mixing 
of the supports, the mixed supports are not divided into 
n w reaction vessels but, instead, are contained in a 
single r action v ssel to synth size the specified codon. 

35 The specified codon is synthesized sequ ntially from 
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individual monom rs as describ d abov • Thus, the number 
of r action vessels is increased or decreased at each step 
to allow for the synthesis of a specified codon or a 
desired number of random codons. In the most preferred 
embodiment of this invention, the specified codons are 
codons capable of forming covalent bonds, e.g., cysteine, 
glutamic acid, lysine, leucine and tyrosine. 

Following codon synthesis, the mixed supports are 
divided into individual reaction vessels for synthesis of 
the next codon to be randomized (Figure 1, step 3) or can 
be used without separation for synthesis of a consecutive 
specified codon. The rounds of synthesis can be repeated 
for each codon to be added until the desired number of 
positions with predetermined or randomized codons are 
obtained. 

Synthesis of oligonucleotides with the first 
position codon being specified can also be synthesized 
using the above method. In this case, the first position 
codon is synthesized from the appropriate monomers. The 
20 supports are divided into the required number of reaction 
vessels needed for synthesis of random codons at the second 
position and the rounds of synthesis, mixing and dividing 
are performed as described above. 

A method of synthesizing oligonucleotides having 
25 tuplets which are diverse but biased toward a predetermined 
sequence is also described herein. This method employs two 
reaction vessels, one vessel for the synthesis of a 
predetermined sequence and the second vessel for the 
synthesis of a random sequence. This method is 
30 advantageous to use when a significant number of codon 
positions, for example, are to be of a specified sequence 
since it alleviates the use of multipl reaction vess Is. 
Instead, a mixtur of four diff r nt monomers such as 
adenine, guanine, cytosine and thymine nucleotides are used 



5 



10 



15 
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fjDr the first and second monomers in the codon. The codon 
is completed by coupling a mixture of a pair of monomers of 
either guanine and thymin or cytosine and adenine 
nucleotides at the third monomer position. In the second 
5 vessel, nucleotide monomers are coupled sequentially to 
yield the predetermined codon sequence. Mixing of the two 
supports yields a population of oligonucleotides containing 
both the predetermined codon and the random codons at the 
desired position. Synthesis can proceed by using this 
10 mixture of supports in a single reaction vessel, for 
example, for coupling additional predetermined codons or, 
further dividing the mixture into two reaction vessels for 
synthesis of additional random codons. 

The two reaction vessel method can be used for 
15 codon synthesis within an oligonucleotide with a 
predetermined tuplet sequence by dividing the support 
mixture into two portions at the desired codon position to 
be randomized. Additionally, this method allows for the 
extent of randomization to be adjusted. For example, 
20 unequal mixing or dividing of the two supports will change 
the fraction of codons with predetermined sequences 
compared to those with random codons at the desired 
position. Unequal mixing and dividing of supports can be 
useful when there is a need to synthesize random codons at 
25 a significant number of positions within an oligonucleotide 
of a longer or shorter length. 

The extent of randomization can also be adjusted 
by using unec[ual mixtures of monomers in the first, second 
and third monomer coupling steps of the random codon 
30 position. The unequal mixtures can be in any or all of the 
coupling steps to yield a population of codons enriched in 
sequences reflective of the monomer proportions. 

Synthesis of randomized oligonucleotides is 
performed using methods w 11 known to one skilled in the 
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art. Linear coupling of monomers can, for example, be 
accomplish d using phosphoramidite chemistry with a 
MilliGen/Biosearch Cyclon Plus automated synthesizer as 
described by the manufacturer (Millipore, Burlington, MA). 
5 Other chemistries and automated synthesizers can be 
employed as well and are known to one skilled in the art. 

Synthesis of multiple codons can be perfozrmed 
without modification to the synthesizer by separately 
synthesizing the codons in individual sets of reactions. 
10 Alternatively, modification of an automated DNA synthesizer 
can be performed for the simultaneous synthesis of codons 
in multiple reaction vessels. 

In one embodiment, the invention provides a 
plurality of procaryotic cells containing a diverse 

15 population of expressible oligonucleotides operationally 
linked to expression elements, the expressible 
oligonucleotides having a desirable bias of random codon 
sequences. These oligonucleotides can, in one embodiment, 
be produced from diverse combinations of first and second 

20 precursor oligonucleotides having a desirable bias of 
random sequences. The invention provides for a method for 
constructing such a plurality of procaryotic cells as well. 

The oligonucleotides synthesized by the above 
methods can be used to express a plurality of random 

25 soluble peptides having constrained secondary structure in 
solution, diverse but biased toward a predetermined 
sequence or which contain at least one specified codon at 
a predetermined position. The need will determine which 
type of oligonucleotide is to be expressed to give the 

30 resultant population of random peptides and is known to one 
skilled in the art. Expression can be performed in any 
compatible v ctor/host system. Such systems include, for 
example, plasmids or phagemids in procaryotes such as E. 
coli, yeast systems, and other eucaryotic systems such as 
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mammalian cells, but: will be described herein in context 
with its presently preferred embodiment , i.e. expression on 
the surfac of filamentous bacteriophag . Filamentous 
bacteriophage can be, for example, M13, fl and fd« Such 
S phage have circular single-stranded genomes and double 
strand replicative DNA forms. Additionally, the peptides 
can also be expressed in soluble or secreted form depending 
on the need and the vector /host system employed. 
Furthermore, this invention provides host cells containing 
10 the expressible oligonucleotides, the vectors and the 
isolated soluble, stable peptides produced by growing a 
host cell described above under conditions favoring 
expression of the oligonucleotide, and isolating the 
peptide so produced. 

15 For the purpose of illustration only, expression 

of random peptides on the surface of HI 3 can be 
accomplished, for example, using the vector system shown in 
Figure 3. Construction of the vectors enabling one of 
ordinary skill to make them are explicitly set out in 

20 Examples I and II. The complete nucleotide sequences are 
given in Figures 5, 6 and 7 (SEQ ID NOS: 1, 2 and 3, 
respectively). This system produces random 

oligonucleotides functionally linked to expression elements 
and to gVIII by combining two smaller oligonucleotide 

25 portions contained in separate vectors into a single 
vector. The diversity of oligonucleotide species obtained 
by this system or others described herein can be 5 x 10^ or 
greater. Diversity of less than 5 x 10^ can also be 
obtained and will be determined by the need and type of 

30 random peptides to be expressed. The random combination of 
two precursor portions into a larger oligonucleotide 
increases the diversity of the population several fold and 
has the added advantage of producing oligonucleotides 
larger than what can be synthesized by standard methods. 

35 Additionally, although the correlation is not known, when 
the number of possible paths an oligonucleotide can take 
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during synth sis such as d scribed herein is greater than 
the number of b ads, then there will b a correlation 
b tw en the synthesis path and the s guences obtained. By 
combining oligonucleotide populations which are synthesized 
5 separately, this correlation will be destroyed. Therefore, 
any bias which may be inherent in the synthesis procedures 
will be alleviated by joining two precursor portions into 
a contiguous random oligonucleotide. 

Populations of precursor oligonucleotides to be 

10 combined into an expressible form are each cloned into 
separate vectors. The two precursor portions which make up 
the combined oligonucleotide corresponds to the carboxy and 
amino terminal portions of the expressed peptide. Each 
precursor oligonucleotide can encode either the sense or 

15 ant i- sense and will depend on the orientation of the 
expression elements and the gene encoding the fusion 
portion of the protein as well as the mechanism used to 
join the two precursor oligonucleotides. For the vectors 
shown in Figure 3, precursor oligonucleotides corresponding 

20 to the carboxy terminal portion of the peptide encode the 
sense strand. Those corresponding to the amino terminal 
portion encode the anti-sense strand. Oligonucleotide 
populations are inserted between the Eco RI and Sac I 
restriction enzyme sites in M13IX22 and H13IX42 (Figure 3A 

25 and B) . H13IX42 (SEQ ID NO: 1) is the vector used for 
sense strand precursor oligonucleotide portions and M13IX22 
(SEQ ID NO: 2) is used for anti-*sense precursor portions. 

The populations of randomized oligonucleotides 
inserted into the vectors are synthesized with Eco RI and 

30 Sac I recognition sequences flanking opposite ends of the 
random codon sequences. The sites allow annealing and 
ligation of these single strand oligonucleotides into a 
double stranded vector restricted with Eco RI and Sac I. 
Alt matively, th oligonucleotid s can be inserted into 

35 the vector by standard mutagenesis methods. In this latter 
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method, single stranded vector DNA is isolated from the 
phage and annealed with random oligonucleotides having 
known sequences complementeiry to vector sequences. The 
oligonucleotides are extended with DNA polymerase to 
5 produce double stranded vectors containing the randomized 
oligonucleotides • 

A vector useful for sense strand oligonucleotide 
portions. Ml 3 1X4 2 (Figure 3B) contains down- stream and in 
frame with the Eco RI and Sac I restriction sites a 

10 sequence encoding the pseudo-wild type gVIII product • This 
gene encodes the wild type M13 gVIII amino acid sequence 
but has been changed at the nucleotide level to reduce 
homologous recombination with the wild type gVIII contained 
on the same vector. The wild type gVIII is present to 

15 ensure that at least some functional, non-fusion coat 
protein will be produced. The inclusion of a wild type 
gVIII therefore reduces the possibility of non-viable phage 
production sucid biological selection against certain peptide 
fusion proteins. Differential regulation of the two genes 

20 can also be used to control the relative ratio of the 
pseudo and wild type proteins. 

Also contained downstream and in frame with the 
Eco RI and Sac I restriction sites is an amber stop codon. 
The mutation is located six codons downstream from Sac I 

25 and therefore lies between the inserted oligonucleotides 
and the gVZII sequence. As was the function of the wild 
type gVIII, the amber stop codon also reduces biological 
selection when combining precursor portions to produce 
expressible oligonucleotides. This is accomplished by 

30 using a non-suppressor (sup 0) host strain because non- 
suppressor strains will terminate expression after the 
oligonucleotide sequences but before the pseudo gVIII 
sequ nces. Ther for , th pseudo gVIII will never be 
expressed on th phage surface under these circumstances. 

35 Instead, only solubl peptides will be produced. 
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Expression in a non-suppr ssor strain can be advantag ously 
utilized when on wishes to produce larg populations of 
soluble peptides. Stop codons other than ainber, such as 
opal and ochre, or molecular switches, such as inducible 
5 repressor elements, can also be used to unlink peptide 
expression from surface expression. Additional controls 
exist as well and are described below. 

A vector useful for anti-sense strand 
oligonucleotide portions, M13IX22, (Figure 3A) , contains 

10 the expression elements for the peptide fusion proteins. 
Upstream and in frame with the Sac I and Eco RI sites in 
this vector is a leader sequence for surface expression. 
A r ibo s ome binding s it e and Lac Z promoter / operator 
elements are present for transcription and translation of 

15 the peptide fusion proteins. 

Both vectors contain a pair of Fok I restriction 
enzyme sites (Figure 3 A and B) for joining together two 
precursor oligonucleotide portions and their vector 
sequences. One site is located at the ends of each 

20 precursor oligonucleotide which is to be joined. The 
second Fok I site within the vectors is located at the end 
of the vector sequences which are to be joined. The 5' 
overhang of this second Fok I site has been altered to 
encode a sequence which is not found in the overhangs 

25 produced at the first Fok I site within the oligonucleotide 
portions. The two sites allow the cleavage of each 
circular vector into two portions and subsequent ligation 
of essential con^onents within each vector into a single 
circular vector where the two oligonucleotide precursor 

30 portions form a contiguous sequence (Figure 3C). Non- 
compatible overhangs produced at the two Fok I sites allows 
optimal conditions to be selected for performing 
concatemerization or circularization r actions for joining 
the two V ctor portions. Such selection of conditions can 

35 be used to gov rn the reaction order and th ref or increase 
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the efficiency of joining. 

Fok I is a restriction enzym whose recognition 
sequence is distal to the point of cleavage. Distal 
placement of the recognition sequence in its location to 
5 the cleavage point is important since if the tv/o were 
superimposed within the oligonucleotide portions to be 
combined, it would lead to an invariant codon sequence at 
the juncture. To alleviate the formation of invariant 
codons at the juncture, Fok I recognition sequences can be 

10 placed outside of the random codon sequence and still be 
used to restrict within the random sequence. Subsequent 
annealing of the single-strand overhangs produced by Fok I 
and ligation of the two oligonucleotide precursor portions 
allows the juncture to be formed. A variety of restriction 

15 enzymes restrict DNA by this mechanism and can be used 
instead of Fok I to join precursor oligonucleotides without 
creating invariant codon sequences. Such enzymes include, 
for example, Alw I, Bbu I, Bsp HI, Hga I, Hph I, Mbo II, 
Mnl I, Pie I and Sfa NI. One skilled in the art knows how 

20 to substitute Fok I recognition sequences for alternative 
enzyme recognition sequences such as those above, and use 
the appropriate enzyme for joining precursor oligo- 
nucleotide portions. 

Although the sequences of the precursor 
25 oligonucleotides are random and will invariably have 
oligonucleotides within the two precursor populations whose 
sequences are sufficiently complementary to anneal after 
cleavage, the efficiency of annealing can be increased by 
insuring that the single- strand overhangs within one 
30 precursor population will have a complementary sequence 
within the second precursor population. This can be 
accomplished by synthesizing a non-degenerate series of 
known s qu nces at the Fok I cleavage site coding for ach 
of the twenty amino acids. Since the Fok I cleavage site 
35 contains a four base ov rhang, forty different sequences 
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are ne ded to randomly encod all tw nty amino acids. For 
example, if two precursor populations of ten codons in 
length are to be combin d, then aft r the ninth codon 
position is synthesized, the mixed population of supports 
5 are divided into forty reaction vessels for each of the 
populations and complementary sequences for each of the 
corresponding reaction vessels between populations are 
independently synthesized. The sequences are shown in 
Tables III and VI of Example I where the oligonucleotides 

10 on columns IR through 4 OR form complementary overhangs with 
the oligonucleotides on the corresponding columns IL 
through 40L once cleaved. The degenerate X positions in 
Table VI are necessary to maintain the reading frame once 
the precursor oligonucleotide portions are joined. 

15 However, use of restriction enzymes which produce a blunt 
end, such as Hnl I can be alternatively used in place of 
Fok I to alleviate the degeneracy introduced in maintaining 
the reading frame. 

The last feature exhibited by each of the vectors 
20 is an amber stop codon located in an essential coding 
sequence within the vector portion lost during combining 
(Figure 3C). The amber stop codon is present to select for 
viable phage produced from only the proper combination of 
precursor oligonucleotides and their vector sequences into 
25 a single vector species. Other non-sense mutations or 
selectable markers can work as well* 

The combining step randomly brings together 
different precursor oligonucleotides within the two 
populations into a single vector (Figure 3C; M13IX) . For 

30 example, the vector sequences donated from each independent 
vector described above, M13IX22 and H13IX42, are necessary 
for production of viable phage. Also, since the expression 
elem nts are contained in M13IX22 and the gVIII sequences 
are contained in M13IX42, xpression of functional gVIII- 

35 peptide fusion proteins cannot be accomplished until the 



wo 94/11496 



PCr/US93/10850 



26 

sequences are linked as shown in M13IX. 

The combining step is performed by r stricting 
each population of vectors containing randomized 
oligonucleotides with Fok I, mixing and ligating (Figure 
5 3C) • hay vectors generated which contain an amber stop 
codon will not produce viable phage when introduced into a 
non-suppressor strain (Figure 3D). Therefore, only the 
sequences which do not contain an amber stop codon will 
make up the final population of vectors contained in the 
10 library. These vector sequences are the sequences required 
for surface expression of randomized peptides. By 
analogous methodology, more than two vector portions can be 
combined into a single vector which expresses random 
peptides . 

15 Surface expression of the random peptide library 

is performed in an amber suppressor strain. As described 
above, the amber stop codon between the random codon 
sec[uence and the gVIII sequence unlinks the two components 
in a non-suppressor strain. Isolating the phage produced 

20 from the non- suppressor strain and infecting a suppressor 
strain will link the random codon sequences to the gVIII 
sequence during expression (Figure 3E). Culturing the 
suppressor strain after infection allows the expression of 
all peptide species within the library as gVIII-peptide 

25 fusion proteins. Alternatively, the DNA can be isolated 
from the non-suppressor strain and then introduced into a 
suppressor strain to accomplish the same effect. 

The level of expression of gVIII-peptide fusion 
proteins can additionally be controlled at the 
30 transcriptional level. The gVIII-peptide fusion proteins 
are under the inducible control of the Lac Z 
promot r/operator system. Other inducible promoters can 
work as w 11 and are known by one skill d in the art. For 
high levels of surface expression, the suppressor library 
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is cultured in an inducer of the Lac Z promot r such as 
isopropylthio-i3*galactoside (IFTG). Inducible control is 
beneficial because biological s lection against non- 
functional gVIII-peptide fusion proteins can be minimized 
5 by culturing the library under non-expressing conditions. 
Expression can then be induced only at the time of 
screening to ensure that the entire population of 
oligonucleotides within the library are accurately 
represented on the phage surface. Also this can be used to 
10 control the valency of the peptide on the phage surface. 

The surface expression library is screened for 
specific peptides which bind ligand binding proteins by 
standard affinity isolation procedures • Such methods 
include, for example, panning, affinity chromatography and 

15 solid phase blotting procedures. Panning as described by 
Parmley and Smith, Gene 73:305-318 (1988), which is 
incorporated herein by reference, is preferred because high 
titers of phage can be screened easily, quickly and in 
small volumes. Furthermore, this procedure can select 

20 minor peptide species within the population, which 
otherwise would have been undetectable, and amplified to 
substantially homogenous populations. The selected peptide 
sequences can be determined by sequencing the nucleic acid 
encoding such peptides after amplification of the phage 

25 population. 

The invention provides a plurality of procaryotic 
cells containing a diverse population of oligonucleotides 
encoding soluble peptides having constrained secondary 
structure in solution, the oligonucleotides being 
30 operationally linked to expression sequences. The 
invention provides for methods of constructing such 
populations of cells as well. 

Random oligonucleotides synth sized by any of the 
methods d scribed previously can also be xpressed on the 
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surface of filamentous bacteriophage, such as M13, for 
example, without the joining together of pr cursor 
oligonucleotid s. A v ctor such as that shown in Figure 4, 
M13IX30, can be used. This vector exhibits all the 
5 functional features of the combined vector shown in Figure 
3C for surface expression of gVIII-peptide fusion proteins. 
The complete nucleotide sequence for H13IX30 (SEQ ID KO: 3) 
is shown in Figure 7. 

For example, M13IX30 contains a wild type gVIII 
10 for phage viability and a pseudo gVIII sequence for peptide 
fusions. The vector also contains in frame restriction 
sites for cloning random peptides. The cloning sites in 
this vector are Xho I, Stu I and Spe I. Oligonucleotides 
should therefore be synthesized with the appropriate 
15 complementary ends for annealing and ligation or 
insertional mutagenesis. Alternatively, the appropriate 
termini can be generated by PGR technology. Between the 
restriction sites and the pseudo gVIII sequence is an in- 
frame amber stop codon, again, ensuring coziq>lete viability 
20 of phage in constructing and manipulating the library. 
Expression and screening is performed as described above 
for the surface expression library of oligonucleotides 
generated from precursor portions. 

Thus, peptides can be selected that are capable 
25 of being bound by a ligand binding protein from a 
population of random peptides by (a) operationally linking 
a diverse population of oligonucleotides having a desirable 
bias of random codon sequences to expression elements; (b) 
introducing said population of vectors into a compatible 
30 host under conditions sufficient for expressing said 
population of random peptides; and (c) determining the 
peptides which bind to said binding protein. Also provided 
is a method for det rmining the encoding nucleic acid 
sequence of such selected peptid s. 
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Th foil wing xamples are intend d to 
illustrate^ but not limit the inv ntion. 

EXAMPLE I 

5 Isolation and Characterization of Peptide Lioands Generated 
From Right and Left Half Random Oligonucleotides 

This example shows the synthesis of random 
oligonucleotides and the construction and expression of 
surface expression libraries of the encoded randomized 
10 peptides. The random peptides of this example derive from 
the mixing and joining together of two random 
oligonucleotides. Also demonstrated is the isolation and 
characterization of peptide ligemds cmd their corresponding 
nucleotide sequence for specific binding proteins. 

15 Synthesis of Random Oligonucleotides 

The synthesis of two randomized oligonucleotides 
which correspond to smaller portions of a larger randomized 
oligonucleotide is shoim below. Each of the two smaller 
portions make up one-half of the larger oligonucleotide. 

20 The population of randomized oligonucleotides constituting 
each half are designated the right and left half. Each 
population of right and left halves axe ten codons in 
length with twenty random codons at each position. The 
right half corresponds to the sense sequence of the 

25 randomized oligonucleotides and encode the carboxy terminal 
half of the expressed peptides. The left half corresponds 
to the anti-sense sequence of the randomized 
oligonucleotides and encode the amino terminal half of the 
expressed peptides. The right and left halves of the 

30 randomized oligonucleotide populations are cloned into 
8 parate vector sp cies and then mix d and joined so that 
the right and left halves c me together in random 
combination to produc a single expr ssion vector species 
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which contains a population of randomized oligonucleotides 
tw nty codons in length. Electroporation oif the v ctor 
population into an appropriat host produces filam ntous 
phage which express the random peptides on their surface, 

5 The reaction vessels for oligonucleotide 

synthesis were obtained from the manufacturer of the 
automated synthesizer (Millipore, Burlington, MA; supplier 
of MilliGen/Biosearch Cyclone Plus Synthesizer). The 
vessels were supplied as packages containing empty reaction 

10 columns (1 jimole), frits, crimps and plugs 
(MilliGen/Biosearch catalog # GEN 860458). Derivatized and 
underivatized control pore glass, phosphoramidite 
nucleotides, and synthesis reagents were also obtained from 
MilliGen/Biosearch. Crimper and decrimper tools were 

15 obtained from Fisher Scientific Co., Pittsburgh, PA 
(Catalog numbers 06-406-20 and 06-406-25A, respectively). 

Ten reaction columns were used for right half 
synthesis of random oligonucleotides ten codons in length. 
The oligonucleotides have 5 monomers at their 3' end of the 

20 sequence 5'GAGCT3' and 8 monomers at their 5' end of the 
sequence 5 'AATTCCAT3 ' . The synthesizer was fitted with a 
column derivatized with a thymine nucleotide (T-column, 
MilliGen/Biosearch # 0615.50) and was programmed to 
synthesize the sequences shown in Table I for each of ten 

25 columns in independent reaction sets. The sequence of the 
last three monomers (from right to left since synthesis 
proceeds 3 ' to 5 ' } encode the indicated amino acids : 

Table I 

Sequence 

30 Column (5' to 3^1 Amino Acids 



column IR 
column 2R 
column 3R 



(T/G)TTGAGCT 
(T/C)CTGAGCT 
(T/C)ATGAGCT 



Phe and Val 
Ser and Pro 
Tyr and His 
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column 5R 
column 6R 
column 7R 
5 column 8R 

column 9R 
column IR 
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(T/C)GTGaGCT 
(C/A)TGGAGCT 
(C/G)AGGAGCT 
(A/G)CTGAGCT 
(A/G)ATGAGCT 
(T/G)GGQAGCT 
A(T/A)AGAGCT 



Cys and Arg 
L u and Met 
Gin and Glu 
Thr and Ala 
Asn and Asp 
Trp and Gly 
lie and Cys 



where the two monomers in parentheses denote a single 
monomer position within the codon and indicate that an 

10 equal mixture of each monomer was added to the reaction for 
coupling. The monomer coupling reactions for each of the 
10 columns were performed as recommended by the 
manufacturer (amidite version S1.06, # 8400-050990, scale 
1 pM) • After the last coupling reaction, the columns were 

15 washed with acetonitrile and lyophilized to dryness. 

Following synthesis, the plugs were removed from 
each column using a decrimper and the reaction products 
were poured into a single weigh boat. Initially the bead 
mass increases, due to the weight of the monomers, however, 

20 at later rounds of synthesis material is lost. In either 
case, the material was equalized with underivatized control 
pore glass and mixed thoroughly to obtain a random 
distribution of all twenty codon species. The reaction 
products were then aliquotted into 10 new reaction columns 

25 by removing 25 mg of material at a time and placing it into 
separate reaction columns. Alternatively, the reaction 
products can be aliquotted by suspending the beads in a 
liquid that is dense enough for the beads to remain 
dispersed, preferably a liquid that is equal in density to 

30 the beads, and then aliquoting equal volumes of the 
suspension into separate reaction columns. The lip on the 
inside of the columns where the frits rest was cleared of 
material using vacuum suction with a syringe and 25 G 
n edle. N w frits were plac d onto the lips, the plugs 

35 were fitted into the columns and w re crimped into place 
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using a crimper. 

Synthesis of the second codon position was 
achieved using the above 10 columns containing the random 
mixture of reaction products from the first codon 
5 synthesis. The monomer coupling reactions for the second 
codon position are shown in Table II. An A in the first 
position means that any monomer can be programmed into the 
synthesizer. At that position, the first monomer position 
is not coupled by the synthesizer since the software 

10 assumes that the monomer is already attached to the column. 
An A also denotes that the columns from the previous codon 
synthesis should be placed on the synthesizer for use in 
the present synthesis roiind. Reactions were again 
sequentially repeated for each column as shown in Tcible II 

IS and the reaction products washed and dried as described 
above. 
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Table II 



Column 




Sequence 
(5' to 3M 


Amino Acids 


column 


IR 


(T/G)TTA 


Phe 


and Val 


column 


2R 


(T/C)CTA 


Ser 


and Pro 


column 


3R 


(T/C)ATA 


Tyr 


and His 


column 


4R 


(T/C)GfTA 


Cys 


and Arg 


column 


5R 


(C/A)TGA 


Leu 


and Met 


column 


6R 


(C/G)A6A 


Gin 


and Glu 


column 


7R 


(A/G)CTA 


Thr 


and Ala 


colvunn 


8R 


(A/G)ATA 


Asn 


and Asp 


column 


9R 


(T/G)GGA 


Trp 


and Gly 


column 


lOR 


A(T/A)AA 


He 


and Cys 



Randomization of the second codon position was 
15 achieved by removing the reaction products from each of the 
columns and thoroughly mixing the material. The material 
was again divided into new reaction columns and prepared 
for monomer coupling reactions as described above. 

Random synthesis of the next seven codons 
20 (positions 3 through 9) proceeded identically to the cycle 
described above for the second codon position and again 
used the monomer sequences of T£Udle II. Each of the newly 
repacked columns containing the random mixture of reaction 
products from synthesis of the previous codon position was 
25 used for the synthesis of the subsequent codon position. 
After synthesis of the codon at position nine and mixing of 
the reaction products, the material was divided and 
repacked into 40 different columns and the monomer 
sequences shown in Table III were coupled to each of the 40 
30 columns in independent reactions. The oligonucleotides 
from each of the 40 columns were mix d once mor and 
cleav d from the control pore glass as recommended by the 
manufacturer • 
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Table III 



10 



15 



20 



25 



30 



35 



Column 




Seouence fS' to 3') 


column 


IR 


AATTCTTTTA 


column 


2R 


AATTCTGTTA 


frtl iiinn 


3R 




i^ol limn 














6R 


AATTCTCCTA 

x^fxx X w X WW XX* 




7R 


AATTCGTCTA 

x^nx X WW X w XQ 


wUXUilUl 


oi\ 


AATTCGCCTA 


wUXUUUl 


QP 


AaX X w X XAxA 


coxumn 


1 f\X> 

XUiv 


ililX X CI U aX A 


coxumn 


IIP 
XXa 


A AHwrPfiTAT A 


coxumn 


X^K 


7m fWD/^ Pn 1\ T H 
iiilX XUwwAXA 


coxumn 


XOK 


iiiiXXwX XiaXA 


H HI it 

coxumn 


1 AX> 
X4I\ 


xuiX XwX WVsXn 


wuxvumi 


1 RP 


A ATTPrSTn^P A 
aaX X W» V3 X\9 X A 


woxiiinii 


1 1SP 


aaX X C\9\m*u X a 


limn 


1 7P 
X / 1\ 


A A'PTP'PPTC A 
aaX XCXwXOA 


limn 

coxuinn 


1 fip 


aaX X \^ XaX\Ja 


limn 


1 QP 


A ATTPfiPTC A 

AAX X ^ X w A 


limn 
wWXUIIIIA 


50P 


AATTPGATGA 

AAX X WUAX OA 


w WXIi nil* 


21R 


AATTCTCAGA 

XUXX X W X WJtWf* 


Oft! limn 


22R 


AATTPTGAGA 

AAX X W X KJAwA 


r*ftl limn 


23R 


AATTCGPAGA 


column 

W 1 III! * 


24R 


AATTCGGAGA 


column 


25R 


AATTCTACTA 

x%c%x X w xxiw 


column 


26R 


AATTCTGCTA 


column 


27R 


AATTCGACTA 


column 


28R 


AATTCGGCTA 

Xaf*X X %^«J«JW X^2 


column 


29R 


AATTCTAATA 


column 


30R 


AATTCTGATA 


column 


31R 


AATTCGAATA 


column 


32R 


AATTCGGATA 


column 


33R 


AATTCTTGGA 


column 


34R 


AATTCT66GA 
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column 35R AATTCGTGGA 

column 36R AATTCG66GA 

column 37R AATTCTATAA 

column 38R AATTCTAAAA 

5 column 39R AATTCGATAA 

column 4 OR AATTC6AAAA 

Left half synthesis of random oligonucleotides 
proceeded similarly to the right half synthesis. This half 
of the oligonucleotide corresponds to the anti-sense 

10 sequence of the encoded randomized peptides. Thus, the 
complementary sequence of the codons in Tables I through 
III are synthesized. The left half oligonucleotides also 
have 5 monomers at their 3' end of the sequence 5'GAGCT3' 
and 8 monomers at their 5' end of the sequence 

15 5 'AATTCCAT3 ' . The rounds of synthesis, washing, drying, 
mixing, and dividing are as described above. 

For the first codon position, the synthesizer was 
fitted with a T-column and programmed to synthesize the 
sequences shown in Teible IV for each of ten columns in 
20 independent reaction sets. As with right half synthesis, 
the sequence of the last three monomers (from right to 
left) encode the indicated amino acids; 
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Table IV 



Secpience 



Column 




(5' to 3M 


Amino Acids 


column 


IL 


AA(A/C)GAGCT 


Phe 


and 


Val 


column 


2L 


AG(A/G)GAGCT 


Ser 


and 


Pro 


column 


3L 


AT(A/G)GAGCT 


Tyr 


and 


His 


column 


4L 


AC(A/G)GAGCT 


Cys 


and 


Arg 


column 


5L 


CA(G/T)GAGCT 


Leu 


and 


Met 


column 


6L 


CT(G/C)GAGCT 


Gin 


and 


Glu 


column 


7L 


AG(T/C)GAGCT 


Thr 


and 


Ala 


column 


8L 


AT{T/C)GAGCT 


Asn 


and 


Asp 


column 


9L 


CC{A/C)GAGCT 


Trp 


and 


Gly 


column 


lOL 


T(A/T)TGAGCT 


He 


and 


Cys 



Following washing and drying, the plugs for each 
15 column were removed, mixed and aliguotted into ten new 
reaction columns as described above. Synthesis of the 
second codon position was achieved using these ten columns 
containing the random mixture of reaction products from the 
first codon synthesis* The monomer coupling reactions for 
20 the second codon position are shown in Table V. 

Table V 



Sequence 



Column 




f5' to 3M 


Amino Acids 


column 


IL 


AA(A/C)A 


Phe 


and Val 


column 


2L 


AG (A/6) A 


Ser 


and Pro 


column 


3L 


AT(A/G)A 


Tyr 


and His 


coliunn 


4L 


AC(A/G)A 


Cys 


and Arg 


column 


5L 


CA(G/T)A 


Leu 


and Met 


column 


6L 


CT(G/C)A 


Gin 


and Glu 


column 


7L 


AG(T/C)A 


Thr 


and Ala 


column 


8L 


AT(T/C)A 


Asn 


and Asp 


column 


9L 


CC(A/C)A 


Trp 


and Gly 


column 


lOL 


T(A/T)TA 


He 


and Cys 
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Again, randomization of the sec nd codon position 
was achieved by removing th r action products from each of 
th columns and thoroughly mixing the b ads* The beads 
were repacked into ten new reaction columns. 

5 Random synthesis of the next seven codon 

positions proceeded identically to the cycle described 
above for the second codon position and again used the 
monomer sequences of Tcd>le V. After synthesis of the codon 
at position nine and mixing of the reaction products, the 
10 material was divided and repacked into 40 different columns 
and the monomer sequences shown in Table VI were coupled to 
each of the 40 columns in independent reactions. 

Table VI 



15 


Column 




Seouence f5' to 3' I 




column 


IL 


AATTCCATAAAAXX& 




column 


2L 


AATTCCATAAACXXa 




column 


3L 


AATTCCATAACAXXA 




column 


4L 


AATTCCATAACCXXa 


20 


column 


5L 


AATTCCATA6AAXXA 




column 


6L 


AATTCCATAGACXXA 




column 


7L 


AATTCCATAGGAXXA 




column 


8L 


AATTCCATAGGCXXA 




column 


9L 


AATTCCATATAAXXA 


25 


column 


lOL 


AATTCCATATACXX^ 




column 


IIL 


AATTCCATATGAXXa 




column 


12L 


AATTCCATATGCXXa 




column 


13L 


AATTCCATACAAXXS 




column 


14L 


AATTCCATAGACXXA 


30 


column 


15L 


AATTCCATACGAXXA 




column 


16L 


AATTCCATACGCXXA 




column 


17L 


AATTCCATCAGAXXA 




column 


18L 


AATTCCATCAGCXXA 




column 


19L 


AATTCCATCATl^XXA 


35 


column 


20L 


AATTCCATCATCXXA 
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coliunn 


21L 


AATTCCATCTGAXXA 




column 


22L 


AATTCCATCTGCXXA 




column 


23L 


AATTCCATCTGAXXA 




f*fil nmn 


24L 


AATTCCATCTGCXXA 


c 
D 


COXUIUII 








coxumji 


^ OXi 


AATTPPATCARTPVYA 




column 


Z / Xi 






coxumn 


OPT 


li ATTPP IV T li f2PP YY A 
aaX X L*L>aX AOL*V^ AAA 




coxumn 


9QT. 


A ATTPP T 2in»T A Y Y A 
aaX X L»WaX aX XaAAa 


1 A 
10 


coxumn 


JUXi 


A ATTPP a "panwrp Y Y A 

aaX XUCaXaX X^AAa 




column 


^ 1 T 


ilA X X U (Jii X iiX UaA Aii 




column 


JZXi 






column 




Aax lULpAiUL*iiiiAAA 




column 


O i! T 


AATTCCATCCiiL A aA 


15 


column 


OCT 


AATTCCATCCCAaaA 




column 


36L 


AATTCCATCCCCXXA 




column 


37L 


AATTCCATTATAXXA 




column 


38L 


AATTCCATTATCXXA 




column 


39L 


AATTCCATTTTAXXA 


20 


column 


40L 


AATTCCATTTTCXXA 



The first two monomers denoted by an "X" represent an equal 
mixture of all four nucleotides at that position. This is 
necessary to retain a relatively unbiased codon sequence at 
the junction between right and left half oligonucleotides* 
25 The above right and left half random oligonucleotides were 
cleaved and purified from the supports and used in 
constructing the surface expression libraries below. 

Vector Construction 

Two Ml3-based vectors, M13IX42 {SEQ ID NO: 1) and 
30 Ml 3 1X2 2 (SEQ ID NO: 2), were constructed for the cloning 
and propagation of right and left half populations of 
random oligonucl otides, respectiv ly. The vectors were 
sp cially constructed to facilitat the random joining and 
subsequent expression of right and left half 
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oligonucleotid populations. Each vector within the 
population contains on right and one left half 
oligonucl otid from th population joined tog ther to form 
a single contiguous oligonucleotide with random codons 
5 which is twenty-two codons in length. The resultant 
population of vectors are used to construct a surface 
expression library. 

M13IX42, or the right-half vector, was 
constructed to harbor the right half populations of 

10 randomized oligonucleotides. M13mpl8 (Pharmacia, 

Piscataway, NJ) was the starting vector. This vector was 
genetically modified to contain, in addition to the encoded 
wild type M13 gene VIII already present in the vector: (1) 
a pseudo-wild type H13 gene VIII sequence with a stop codon 

15 (amber) placed between it and an Eco Rl-Sac I cloning site 
for randomized oligonucleotides; (2) a pair of Fok I sites 
to be used for joining with Ml 3 1X2 2, the left-half vector; 
(3) a second amber stop codon placed on the opposite side 
of the vector than the portion being combined with the 

20 left-half vector; and (4) various other mutations to remove 
redundant restriction sites and the amino terminal portion 
of Lac Z. 

The pseudo-wild type H13 gene VIII was used for 
surface expression of random peptides. The pseudo-wild 

25 type gene encodes the identical amino acid sequence as that 
of the wild type gene; however, the nucleotide sequence has 
been altered so that only 63% identity exists between this 
gene and the encoded wild type gene VIII. Modification of 
the gene VIII nucleotide sequence used for surface 

30 expression reduces the possibility of homologous 
recombination with the wild type gene VIII contained on the 
same vector. Additionally, the wild type Ml 3 gene VIII was 
retained in the v ctor system to nsure that at least some 
functional, non-fusion coat protein would be produc d. The 

35 inclusion of wild typ g n VIII ther for reduces the 
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possibility of non-viable phage production from the random 
peptide fusion genes. 

The pseudo-wild type gene VIII was constructed by 
chemically synthesizing a series of oligonucleotides which 
5 encode both strands of the gene. The oligonucleotides are 
presented in Table VII (SEQ ID NOS: 7 through 16). 

TABLE VII 

Pseudo-Wild Type Gene VIII Oligonucleotide Series 



Top Strand 
10 Oligonucleotides 

VIII 03 

VIII 04 

15 VIII 05 

VIII 06 

VIII 07 

20 Bottom Strand 

Oligonucleotides 

VIII 08 

VIII 09 

25 

VIII 10 

VIII 11 

30 VIII 12 



Seguence (5' to 3M 

GATCC TAG 6CT GAA GGC GAT 

GAC CCT GCT AAG GCT GC 

A TTC AAT AGT TTA GAG GCA 

AGT GCT ACT GAG TAG A 

TT GGC TAC GCT TGG GCT ATG 

GTA GTA GTT ATA GTT 

GGT GCT ACC ATA GGG ATT AAA 

TTA TTC AAA AAG TT 

T AC6 AGC AAG GCT TCT TA 



AGC TTA AGA AGC CTT GCT C6T 

AAA CTT TTT GAA TAA TTT 

AAT CCC TAT GGT AGC ACC AAC 

TAT AAC TAC TAC CAT 

AGC CCA AGC GTA GCC AAT GTA 

CTC AGT AGC ACT TG 

C CTG TAA ACT ATT GAA TGC 

AGC CTT AGC AGG GTC 

ATC GCC TTC AGC CTA G 
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Except for th terminal oligonucleotides VIII 03 
(SEQ ID NO: 7) and VIII 08 (SEQ ID NO: 12), the above 
oligonucleotides (oligonucleotides VIII 04-VIII 07 and 09- 
12 (SEQ ID NOS: 8 through 11 and 13 through 16)) were mixed 
5 at 200 ng each in 10 }il final volume and phosphorylated 
with T4 polynucleotide Kinase (Pharmacia, Piscataway, NJ) 
with 1 mM ATP at 37^0 for 1 hour. The reaction was stopped 
at 65 for 5 minutes. Terminal oligonucleotides were 
added to the mixture and annealed into double-stranded f 03nn 

10 by heating to 65**C for 5 minutes, followed by cooling to 
room temperature over a period of 30 minutes. The annealed 
oligonucleotides were ligated together with 1.0 U of T4 DNA 
ligase (BRL). The annealed and ligated oligonucleotides 
yield a double-stranded DNA flanked by a Bam HI site at its 

15 5' end and by a Hind III site at its 3' end. A 
translational stop codon (amber) immediately follows the 
Bam HI site. The gene VIII sequence begins with the codon 
GAA (61u) two codons 3' to the stop codon. The double- 
stranded insert was phosphorylated using T4 DNA Kinase 

20 (Pharmacia, Piscataway, NJ) and ATP (10 mM Tris-HCl, pH 
7.5, 10 mM MgClj) and cloned in frame with the Eco RI and 
Sac I sites within the H13 poly linker. To do so, Ml3mpl8 
was digested with Bam HI (New England Biolabs, Beverley, 
MA) and Hind III (New England Biolabs) and combined at a 

25 molar ratio of 1:10 with the double-stranded insert. The 
ligations were performed at 16 overnight in IX ligase 
buffer (50 mM Tris-HCl, pH 7.8, 10 mM MgClj, 20 mM DTT, 1 mM 
ATP, 50 j/g/ml BSA) containing 1.0 U of T4 DNA ligase (New 
England BioleJds). The ligation mixture was transformed 

30 into a host and screened for positive clones using standard 
procedures in the art. 

Several mutations were generated within the 
right-half vector to yield functional M13IX42. The 
mutations were gen rat d using the m thod of Kunkel et al., 
35 Meth. Enzymol. 154:367-382 (1987), which is incorporated 
h r in by refer nee, for sit -directed mutagenesis. The 
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reagents, strains and protocols were obtain d from a Bio 
Rad Mutagenesis kit (Bio Rad, Richmond, CA) and mutagenesis 
was performed as r commended by th manufacturer. 

A Fok I site used for joining the right and left 
5 halves was generated 8 nucleotides 5' to the unique Eco RI 
site using the oligonucleotide 5 ' -CTCGAATTCGTACATCCT 
GGTCATAGC-3' (SEQ ID NO: 17). The second Fok I site 
retained in the vector is naturally encoded at position 
3547; however, the sequence within the overhang was changed 

10 to encode CTTC. Two Fok I sites were removed from the 
vector at positions 239 and 7244 of Ml3mpl8 as well as the 
Hind III site at the end of the pseudo gene VIII sequence 
using the mutant oligonucleotides 5 ' -CATTTTTGCAGATGGCTTAGA 
-3' (SEQ ID NO: 18) and 5 '-TAGCATTAACGTCCAATA-3 ' (SEQ ID 

15 NO: 19), respectively. New Hind III and Hlu I sites were 
also introduced at position 3919 and 3951 of M13IX42. The 
oligonucleotides used for this mutagenesis had the 
sequences 5'-ATATATTTTAGTAAGCTTCATCTTCT-3 ' (SEQ ID NO: 20) 
and 5'-GACAAAGAACGCGTGAAAACTTT-3' (SEQ ID NO: 21), 

20 respectively. The amino terminal portion of Lac Z was 
deleted by oligonucleotide-directed mutagenesis using the 
mutant oligonucleotide 5'- 
GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT-3 ' ( SEQ ID NO : 22). 
This deletion also removed a third M13mpl8 derived Fok I 

25 site. The distance between the Eco RI and Sac I sites was 
increased to ensure complete double digestion by inserting 
a spacer sequence* The spacer sequence was inserted using 
the oligonucleotide 5'- 
TTCAGCCTAGGATCCGCCGAGCTCTCCTACCTGCGAATTCGTACATCC-3 ' ( SEQ ID 

30 NO: 23). Finally, an amber stop codon was placed at 
position 4492 using the mutant oligonucleotide 5'- 
TGGATTATACTTCTAAATAATGGA-3' (SEQ ID NO: 24). The amber 
stop codon is used as a biological selection to ensure the 
proper recombination of vector sequ nc s to bring together 

35 right and left halves of the randomized oligonucleotides. 
In constructing the eibove mutations, all changes made in a 
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Ml 3 coding r gion wer p r formed such that th amino acid 
s qu nee remained unalt red. It should be noted that 
several mutations within M13mpl8 were found which differed 
from the published sequence. Where known, these sequence 
5 differences are recorded herein as found and therefore may 
not correspond exactly to the published sequence of 
M13mpl8. 

The sequence of the resultant vector, M13IX42, is 
shown in Figure 5 (SEQ ID NO: 1). Figure 3A also shows 

10 M13IX42 where each of the elements necessciry for producing 
a surface expression library between right and left half 
randomized oligonucleotides is marked. The sequence 
between the two Fok I sites shown by the arrow is the 
portion of M13IX42 which is to be combined with a portion 

15 of the left-half vector to produce random oligonucleotides 
as fusion proteins of gene VIII* 

H13IX22, or the left-half vector, was constructed 
to harbor the left half populations of randomized 
oligonucleotides. This vector was constructed from Ml3mpl9 

20 (Pharmacia, Piscataway, NJ) and contains: (1) Two Fok I 
sites for mixing with M13IX42 to bring together the left 
and right halves of the randomized oligonucleotides; (2) 
sequences necessary for expression such as a promoter and 
signal sequence and translation initiation signals; (3) an 

25 Eco Rl-Sac I cloning site for the randomized 
oligonucleotides; and (4) an amber stop codon for 
biological selection in bringing together right and left 
half oligonucleotides. 

Of the two Fok I sites used for mixing M13IX22 
30 with M13IX42, one is naturally encoded in Ml3mpl8 and 
M13mpl9 (at position 3547). As with M13IX42, the overhang 
within this naturally occurring Fok I sit was changed to 
CTTC. The oth r Fok I sit was introduced aft r 
construction of the translation initiation signals by site- 
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directed mutagenesis using the oligonucleotide 5'- 
TAACACTCATTCCGGATGGAATTCTGGAGTCTGGGT-3' (SEQ ID NO: 25). 

The translation initiation signals were 
constructed by annealing of overlapping oligonucleotides as 
5 described above to produce a double-stranded insert 
containing a 5' Eco RI site and a 3' Hind III site. The 
overlapping oligonucleotides are shown in Table VIII (SEQ 
ID NOS: 26 through 34) and were ligated as a double- 
stranded insert between the Eco RI and Hind III sites of 
10 Ml3]i^l8 as described for the pseudo gene VIII insert. The 
ribosome binding site (AGGAG&C) is located in 
oligonucleotide 015 (SEQ ID MO; 26) and the translation 
initiation codon (ATG) is the first three nucleotides of 
oligonucleotide 016 (SEQ ID NO: 27). 

15 TABLE VIII 

Oligonucleotide Series for Construction of 
Translation Signals in M13IX22 

Oligonucleotide Sequence (5' to 3 * ) 



015 

20 016 
017 
018 

25 

019 
020 

021 

30 

022 



AATT C GCC AAG GAG ACA GTC AT 

AATG AAA TAC CTA TTG CCT ACG GCA 

GCC GCT GGA TTG TT 

ATTA CTC GCT GCC CAA CCA GCC ATG 

GCC GAG CTC GTG AT 

GACC CAG ACT CCA GATATC CAA CAG 

GAA TGA GTG TTA AT 

TCT AGA ACG CGT C 

ACGT G ACG CGT TCT AGA AT TAA 

CACTCA TTC CTG T 

TG GAT ATC TGG AGT CTG GGT CAT 
CAC GAG CTC GGC CAT G 
GC TGG TTG GGC AGC GAG TAA TAA 
CAA TCC AGC GGC TGC C 
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023 GT AGG CAA TAG GTA TTT CAT TAT 

GAC TGT CCT TGG CG 

Oligonucleotide 017 (SEQ ID HO: 27) contained a 
Sac I restriction site 67 nucleotides downstream from the 
5 ATG codon. The naturally occurring Eco RI site was removed 
and a new site introduced 25 nucleotides downstream from 
the Sac I. Oligonucleotides 5'- 

TGACTGTCTCCTTGGCGTGTGAAATTGTTA-3 ' (SEQ ID NO: 35) and 5'- 
TAACACTCATTCCGGATGGAATTCTGGAGTCT 
10 GGGT-3' (SEQ ID NO: 36) were used to generate each of the 
mutations, respectively. An amber stop codon was also 
introduced at position 3263 of Ml3mpl8 using the 
oligonucleotide 5'-CAATTTTATCCTAAATCTTACCAAC-3 ' (SEQ ID NO: 
37) . 

15 In addition to the above mutations, a variety of 

other modifications were made to remove certain sequences 
and redundant restriction sites. The LAC Z ribosome 
binding site was removed when the original Eco RI site in 
HlSmplB was mutated. Also, the Fok I sites at positions 

20 239, 6361 and 7244 of M13mpl8 were likewise removed with 
mutant oligonucleotides 5 '-CATTTTTGCAGATGGCTTAGA-3 ' (SEQ ID 
NO: 38), 5^-CGAAA6666GGTGTGCT6CAA-3' (SEQ ID NO: 39) and 
5 ' -TAGCATTAACGTCCAATA-3 ' ( SEQ ID NO : 40), respectively . 
Again, mutations within the coding region did not alter the 

25 amino acid sequence. 

The resultant vector, H13IX22, is 7320 base pairs 
in length, the sequence of which is shown in Figure 6 (SEQ 
ID NO: 2). The Sac I and Eco RI cloning sites are at 
positions 6290 and 6314, respectively. Figure 3A also 
30 shows M13IX22 where each of the elements necessary for 
producing a surface expression library between right and 
left half randomized olig nucle tides is marked. 

Library Construction 



wo 94/11496 



PCr/US93/10850 



46 

Each population of right and left half randomized 
oligonucleotides from columns IR through 4 OR and columns IL 
through 40L are cl ned s parately into M13IX42 and M13IX22, 
respectively, to create sublibraries of right and left half 
5 randomized oligonucleotides. Therefore, a total of eighty 
sublibraries are generated. Separately maintaining each 
population of randomized oligonucleotides until the final 
screening step is performed to ensure maximum efficiency of 
annealing of right and left half oligonucleotides. The 

10 greater efficiency increases the total number of randomized 
oligonucleotides which can be obtained. Alternatively, one 
can combine all forty populations of right half 
oligonucleotides (coliomns 1R-40R) into one population and 
of left half oligonucleotides (columns 1L-40L) into a 

15 second population to generate just one sublibrary for each. 

For the generation of sublibraries, each of the 
above populations of randomized oligonucleotides are cloned 
separately into the appropriate vector. The right half 
oligonucleotides are cloned into M13IX42 to generate 

20 sublibraries M13IX42.1R through M13IX42.40R. The left half 
oligonucleotides are similarly cloned into M13IX22 to 
generate sublibraries M13IX22.1L through M13IX22.40L. Each 
vector contains unique Eco RI and Sac I restriction enzyme 
sites which produce 5' and 3' single* stranded overhangs, 

25 respectively, when digested. The single strand overhangs 
are used for the annealing and ligation of the 
complementary single-stranded random oligonucleotides. 

The randomized oligonucleotide populations are 
cloned between the Eco RI and Sac I sites by sequential 

30 digestion and ligation steps. Each vector is treated with 
an excess of Eco RI (New England Biolabs) at 37 for 2 
hours followed by addition of 4-24 units of calf intestinal 
alkalin phosphatase (Boehringer Mannheim, Indianapolis, 
IN). R actions ar stopp d by phenol /chloroform xtraction 

35 and ethanol precipitation. The pell ts are resuspended in 
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an appropriate amount of distilled or deioniz d water 
(dHjO) . About 10 pmol of v ctor is mixed with a 50 00- fold 
molar excess of each population of randomized 
oligonucleotides in 10 pi of IX ligase buffer (50 mM Tris- 
5 HCl, pH 7.8, 10 mM MgClar 20 mM DTT, 1 mM ATP, 50 fjg/ml BSA) 
containing 1.0 U of T4 DNA ligase (BRL, Gaither sburg , MD) . 
The ligation is incubated at 16**C for 16 hours. Reactions 
are stopped by heating at 75 *C for 15 minutes and the DNA 
is digested with an excess of Sac I (New England Biolabs) 

10 for 2 hours. Sac I is inactivated by heating at 75 for 
15 minutes and the volume of the reaction mixture is 
adjusted to 300 ;j1 with an appropriate amount of lOX ligase 
buffer and dHjO. One unit of T4 DNA ligase (BRL) is added 
and the mixture is incubated overnight at 16*C. The DNA is 

15 ethanol precipitated and resuspended in TE (10 mM Tris-HCl, 
pH 8.0, 1 mM EDTA). DNA from each ligation is 
electroporated into XLl Blue^ cells (Stratagene, La Jolla, 
CA) , as described below, to generate the sublibraries. 

E* coli XLl Blue™ is electroporated as described 

20 by Smith et al.. Focus 12:38-40 (1990) which is 
incorporated herein by reference. The cells are prepared 
by inoculating a fresh colony of XLls into 5 mis of SOB 
without magnesium (20 g bacto-tryptone , 5 g bacto-yeast 
extract, 0.584 g NaCl, 0.186 g KCl, dHjO to 1,000 mis) and 

25 grown with vigorous aeration overnight at 37 ^C. SOB 
without magnesium (500 ml) is inoculated at 1:1000 with the 
overnight culture and grown with vigorous aeration at 37 
until the ODsso is 0.8 (about 2 to 3 h). The cells are 
harvested by centrifugation at 5,000 rpm (2,600 x g) in a 

30 GS3 rotor (Sorvall, Newtown, CT) at 4*C for 10 minutes, 
resuspended in 500 ml of ice-cold 10% (v/v) sterile 
glycerol and centrifuged and resuspended a second time in 
the same manner* After a third centrifugation, the cells 
ar resuspend d in 10% sterile glycerol at a final volume 

35 of about 2 ml, such that th OD550 of the susp nsion is 200 
to 300. Usually, resuspension is achieved in the 10% 
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glycerol that remains in the bottle after pouring off the 
supernate. Cells are frozen in 40 pi aliquots in 
microc ntrifuge tubes using a dry ice-ethanol bath and 
stored frozen at -70**C. 

5 Frozen cells are electroporated by thawing slowly 

on ice before use and mixing with about 10 pg to 500 ng of 
vector per 40 ^il of cell suspension. A AO iil aliquot is 
placed in an 0«1 cm electroporation chamber (Bio-Rad, 
Richmond, CA) and pulsed once at 0*C using 200 Q parallel 

10 resistor, 25 ^F, 1.88 kV, which gives a pulse length (t) of 
-4 ms. A 10 ;il aliquot of the pulsed cells are diluted 
into 1 ml SOC (98 mis SOB plus 1 ml of 2 M MgClj and 1 ml of 
2 M glucose) in a 12- x 75-mm culture tube, and the culture 
is shaken at 37**C for 1 hour prior to culturing in 

15 selective media, (see below). 

Each of the eighty sublibraries are cultured 
using methods known to one skilled in the art. Such 
methods can be found in Sambrook et al.. Molecular Cloning: 
A L£Lboratdry Manual, Cold Spring Harbor Laboratory, Cold 

20 Spring Harbor, 1989, and in Ausubel et al.. Current 
Protocols in Molecular Biology, John Wiley and Sons, New 
York, 1989, both of which are incorporated herein by 
reference. Briefly, the above 1 ml sublibrary cultures 
were grown up by diluting 50-fold into 2XyT media (16 g 

25 tryptone, 10 g yeast extract, 5 g NaCl) and culturing at 
37 •C for 5-8 hours. The bacteria were pelleted by 
centrifugation at 10,000 xg. The supernatant containing 
phage was transferred to a sterile tube and stored at 4^C. 

Double strand vector DNA containing right and 
30 left half randomized oligonucleotide inserts is isolated 
from the cell pellet of each sublibrary. Briefly, the 
pellet is washed in TE (10 mM Tris, pH 8.0, 1 mM EDTA) and 
r collect d by centrifugation at 7,000 rpm for 5' in a 
Sorval centrifuge (Mewtown, CT) . Pellets are r suspended 
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in 6 lals of 10% Sucros , 50 mM Tris, pH 8.0. 3.0 ml of 10 
ng/^l lysozyme is add d and incubated on ice for 20 
minutes « 12 mis of 0.2 M NaOH, 1% SDS is added followed by 
10 minutes on ice. The suspensions are then incubated on 
5 ice for 20 minutes after addition of 7.5 mis of 3 M NaOAc, 
pH 4.6. The samples are centrifuged at 15,000 rpm for 15 
minutes at i^C, RNased and extracted with 
phenol /chloroform, followed by ethanol precipitation. The 
pellets are resuspended, weighed and an equal weight of 

10 CsClj is dissolved into each tube until a density of 1.60 
g/ml is achieved. EtBr is added to 600 pg/ml and the 
double-stranded DNA is isolated by equilibrium 
centrifugation in a TV-1665 rotor (Sorval) at 50,000 rpm 
for 6 hours. These DNAs from each right and left half 

15 sublibrary are used to generate forty libraries in which 
the right and left halves of the randomized 
oligonucleotides have been randomly joined together. 

Each of the forty libraries are produced by 
joining together one right half and one left half 

20 sublibrary. The two sublibraries joined together 
corresponded to the same column number for right and left 
half random oligonucleotide synthesis. For example, 
sublibrary M13IX42«1R is joined with M13IX22.1L tjo produce 
the surface expression library M13IX.1RL. In the 

25 alternative situation where only two sublibraries are 
generated from the combined populations of all right half 
synthesis and all left half synthesis, only one surface 
expression librairy would be produced. 

For the random joining of each right and left 
30 half oligonucleotide populations into a single surface 
expression vector species, the DNAs isolated from each 
sublibrary are digested an excess of Fok I (New England 
Biolabs). The r actions ar stopp d by phenol /chloroform 
extraction, follow d by thanol precipitation. Pell ts are 
35 resuspended in dHjO. Each surfac expr ssion library is 
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generated by ligating equal molar amounts (5-10 pmol) of 
Fok I dig sted DNA isolated from corresponding right and 
left half sublibraries in 10 pi of IX ligase buffer 
containing 1.0 U of T4 DNA ligase (Bethesda Research 
5 Laboratories, Gaithersburg, MD). The ligations proceed 
overnight at 16 ®C and eu:e electroporated into the sup O 
strain MK30-3 (Boehringer Mannheim Biochemical, (BMB), 
Indianapolis, IN) as previously described for XLl cells. 
Because MK30-3 is sup 0, only the vector portions encoding 
10 the randomized oligonucleotides which come together will 
produce viable phage. 

Screening of Surface Expression Libraries 

Purified phage are prepared from 50 ml liquid 
cultures of XLl Blue^ cells (Stratagene) which are infected 

15 at a m.o.i. of 10 from the phage stocks stored at 4^C. Th 
cultures are induced with 2 mM IFTG. Supernatant s from all 
cultures are combined and cleared by two centrif ugations , 
and the phage are precipitated by adding 1/7.5 volumes of 
PEG solution (25% PEG-8000, 2.5 M NaCl), followed by 

20 incubation at 4^C overnight. The precipitate is recovered 
by centrif ugation for 90 minutes at 10,000 x g. Phage 
pellets are resuspended in 25 ml of 0.01 M Tris*HCl, pE 
7.6, 1.0 mM EDTA, and 0.1% Sarkosyl and then shaken slowly 
at room temperature for 30 minutes. The solutions are 

25 adjusted to 0.5 M NaCl and to a final concentration of 5% 
polyethylene glycol. After 2 hours at 4^C, the 
precipitates containing the phage are recovered by 
centrif ugation for 1 hour at 15,000 X g. The precipitates 
are resuspended in 10 ml of NET buffer (0.1 M NaCl, 1.0 mM 

30 EDTA, and 0.01 H Tris*HCl, pH 7.6), mixed well, and the 
phage repelleted by centrif ugation at 170,000 X g for 3 
hours. The phage pellets are subsequently resuspended 
overnight in 2 ml of NET buffer and subjected to cesium 
chlorid centrif ugation for 18 hours at 110,000 X g (3.86 

35 g of cesium chloride in 10 ml of buffer) • Phage bands are 
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collected, dilut d 7-fold with NET buffer, recentrifuged at 
170,000 X g for 3 hours, resuspended, and stored at 4^C in 
0.3 ml of NET buffer containing 0.1 mM sodium azide. 

Ligand binding proteins used for panning on 
5 streptavidin coated dishes are first biotinylated and then 
absorbed against inactivated blocking phage ( see below) • 
The biotinylating reagents are dissolved in 
dimethylformamide at a ratio of 2.4 mg solid NHS-SS-Biotin 
( sulf osuccinimidyl 2- ( biotinamido )et hy 1-1,3 

10 dithiopropionate ; Pierce, Rockford, XL) to 1 ml solvent and 
used as recommended by the manufacturer* Small-seal 
reactions are accomplished by mixing 1 ;j1 dissolved reagent 
with 43 /il of 1 mg/ml ligand binding protein diluted in 
sterile bicarbonate buffer (0.1 M NaHCOj, pH 8.6) • After 2 

15 hours at 25 ^C, residual biotinylating reagent is reacted 
with 500 /il 1 M ethanolamine (pH adjusted to 9 with HCl) 
for an additional 2 hours. The entire sample is diluted 
with 1 ml TBS containing 1 mg/ml BSA, concentrated to about 
50 ^1 on a Centricon 30 ultra-filter (Amicon), and washed 

20 on the same filter three times with 2 ml TBS and once with 
1 ml TBS containing 0.02% NaNg and 7 x 10^^ UV-inactivated 
blocking phage (see below); the final retentate (60-80 fil) 
is stored at 4^C. Ligand binding proteins biotinylated 
with the NHS-SS-Biotin reagent are linked to biotin via a 

25 disulfide-containing chain. 

UV-irradiated Ml 3 phage were used for blocking 
binding proteins which fortuitously bound filamentous phage 
in general. H13mp8 (Messing and Vieira, Gene 19: 262-276 
(1982), which is incorporated herein by reference) was 

30 chosen because it carries two amber stop codons, which 
ensure that the few phage surviving irradiation will not 
grow in the sup 0 strains used to titer the surface 
expression libraries. A 5 ml sample containing 5 x 10" 
M13mp8 phag , purified as described abov , was plac d in a 

35 small p tri plat and irradiated with a germicidal lamp at 
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a distance of two feet for 7 minutes (flux 150 pW/cm^). 
NaNa was add d to 0.02% and phage particles concentrated to 
10" particles/ml on a Centricon 30-kDa ultrafilter 
(Amicon) • 

5 For panning, polystyrene petri plates (60 x 15 

mm. Falcon ; Becton Dickinson , Lincoln Park , N J ) are 
incubated with 1 ml of 1 mg/ml of streptavidin (BMB) in 0«1 
M NaHCOa pH 8*6-0.02% NaNa in a small, air-tight plastic box 
overnight in a cold room. The next day streptavidin is 

10 removed and replaced with at least 10 ml blocking solution 
(29 mg/ml of BSA; 3 pg/ml of streptavidin; 0.1 M NaHCOj pH 
8.6-0.02% NaNa) and incubated at least 1 hour at room 
temperature. The blocking solution is removed and plates 
are washed rapidly three times with Tris buffered saline 

15 containing 0.5% Tween 20 (TBS-0.5% Tween 20). 

Selection of phage expressing peptides bound by 
the ligand binding proteins is performed with 5 pi (2.7 pg 
ligand binding protein) of blocked biotinylated ligand 
binding proteins reacted with a 50 pi portion of each 

20 library. Each mixture is incubated overnight at 4*^C, 
diluted with 1 ml TBS-0.5% Tween 20, and transferred to a 
streptavidin-coated petri plate prepared as described 
above. After rocking 10 minutes at room temperature, 
unbound phage are removed and plates washed ten times with 

25 TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound 
phage are eluted from plates with 800 jul sterile elution 
buffer (1 mg/ml BSA, 0.1 M HCl, pH adjusted to 2.2 with 
glycerol) for 15 minutes and eluates neutralized with 48 pi 
2 M Tris (pH unadjusted) . A 20 pi portion of each eluate 

30 is titered on MK30-3 concentrated cells with dilutions of 
input phage. 

A second round of panning is performed by 
treating 750 pi of first eluate from each library with 5 mM 
DTT for 10 minut s to break disulfide bonds linking biotin 
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groups to residual biotinylat d binding proteins. The 
treated eluate is c ncentrated on a Centricon 30 
ultrafilt r (Amicon)/ washed three times with TBS-0*5% 
Tween 20, and concentrated to a final volume of about 50 
5 ;il. Final retentate is transferred to a tube containing 
5.0 }il (2.7 ;ig ligand binding protein) blocked biotinylated 
ligand binding proteins and incubated overnight. The 
solution is diluted with 1 ml TBS-0.5% Tween 20, panned, 
and eluted as described above on fresh streptavidin-coated 
10 petri plates. The entire second eluate (800 pi) is 
neutralized with 48 pi 2 M Tris, and 20 pi is titered 
simultaneously with the first eluate and dilutions of th 
input phage. 

Individual phage populations are purified through 

15 2 to 3 rounds of plaque purification. Briefly, the second 
eluate titer plates are lifted with nitrocellulose filters 
(Schleicher & Schuell, Inc., Keene, NH) and processed by 
washing for 15 minutes in TBS (10 mM Tris-HCl, pH 7*2, 150 
mM NaCl), followed by an incubation with shaking for an 

20 additional 1 hour at 37 with TBS containing 5% nonfat dry 
milk (TBS-5% NDM) at 0«5 Bil/cm^. The wash is discarded and 
fresh TBS-5% NDM is added (0.1 ml/cm^) containing the ligand 
binding protein between 1 nM to 100 mM, preferably between 
1 to 100 pH. All incubations are carried out in heat- 

25 sealable pouches (Sears). Incubation with the ligand 
binding protein proceeds for 12-16 hours at 4^C with 
shaking. The filters are removed from the bags and washed 
3 times for 30 minutes at room temperature with 150 blIs of 
TBS containing 0.1% NDM and 0*2% NP-40 (Sigma, St* Louis, 

30 MO). The filters are then incubated for 2 hours at room 
temperature in antiserum against the ligand binding protein 
at an appropriate dilution in TBS-0.5% NDM, washed in 3 
changes of TBS containing 0.1% NDM and 0.2% NP-40 as 
described above and incubat d in TBS c ntedning 0.1% NDM 

35 and 0.2% NP-40 with 1 x 10*^ cpm of "=I-label d Protein A 
(specific activity = 2.1 x 10^ cpm/pg) . After a washing 
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with TBS containing 0.1% NDM and 0.2% NF-40 as described 
cibove, th filters ar wrapp d in Saran Wrap and xpos d to 
Kodak X-Qmat x-ray film (Kodak, Roch ster, NY) for 1-12 
hours at -70^C using Dupont Cronex Lightning Plus 
5 Intensifying Screens (Dupont, Willmington, DE). 

Positive plagues identified are cored with the 
large end of a pasteur pipet and placed into 1 ml of SM 
(5.8 g NaCl, 2 g MgSO^-THjO, 50 ml 1 M Tris-HCl, pH 7.5, 5 
mis 2% gelatin, to 1000 mis with dHsO) plus 1-3 . drops of 

10 CHCI3 and incubated at 37 •C 2-3 hours or overnight at i^'C. 
The phage are diluted 1:500 in SM and 2 fil are added to 300 
^1 of XLl cells plus 3 mis of soft agar per 100 mm^ plate. 
The XLl cells are prepared for plating by growing a colony 
overnight inlOmlLB(10g bacto-tryptone , 5 g bacto-yeast 

15 extract, 10 g NaCl, 1000 ml dHaO) containing 100 ^1 of 20% 
maltose and 100 ;il of 1 H MgS04. The bacteria are pelletted 
by centrif ugation at 2000 xg for 10 minutes and the pellet 
is resuspended gently in 10 mis of 10 mH MgSO^. The 
suspension is diluted 4-fold by adding 30 mis of 10 mM MgS04 

20 to give ah ODfoo of approximately 0.5. The second and third 
round screens are identical to that described above except 
that the plagues are cored with the small end of a pasteur 
pipet and placed into 0.5 mis SM plus a drop of CHCI3 and 1- 
5 ;il of the phage following incubation are used for plating 

25 without dilution. At the end of the third round of 
purification, an individual plague is picked and the 
templates prepared for sequencing. 

Template Preparation and Seauencina 

Templates are prepared for seguencing by 
30 inoculating a 1 ml culture of 2XYT containing a 1:100 
dilution of an overnight culture of XLl with an individual 
plague. The plagu s ar picked using a sterile toothpick. 
Th culture is incubated at 37 for 5-6 hours with shaking 
and th n transf rr d to a 1.5 ml microfuge tube. 200 pi of 
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PEG solution is added, followed by vortexing and placed on 
ice for 10 minutes. The phage precipitat is r covered by 
centrifugation in a microfuge at 12,000 x g for 5 minutes. 
The supernatant is discarded and the pellet is resuspended 
5 in 230 pi of TE (10 mM Tris-HCl, pH 7.5, 1 mM EDTA) by 
gently pipeting with a yellow pipet tip. Phenol (200 fil) 
is added, followed by a brief vortex and microfuged to 
separate the phases. The aqueous phase is transferred to 
a' separate tube and extracted with 200 ^1 of 

10 phenol /chloroform (1:1) as described £U3ove for the phenol 
extraction. A 0.1 volume of 3 M NaOAc is added, followed 
by addition of 2*5 volumes of ethanol and precipitated at 
-20**C for 20 minutes. The precipitated templates are 
recovered by centrifugation in a microfuge at 12,000 x g 

15 for 8 minutes. The pellet is washed in 70% ethanol, dried 
and resuspended in 25 jil TE. Sequencing was performed 
using a Sequenase™ secpiencing kit following the protocol 
supplied by the msoiufacturer (U.S. Biochemical, Cleveland, 
OH) . 

20 EXAMPLE II 

Isolation and Characterization of Peptide Lioands Generated 
From Oligonucleotides Having Random Codons at Two 
Predetermined Positions 

This example shows the generation of a surface 
25 expression library from a population of oligonucleotides 
having randomized codons. The oligonucleotides are ten 
codons in length and are cloned into a single vector 
species for the generation of a M13 gene Vlll-based surf ac 
expression library. The example also shows the selection 
30 of peptides for a ligand binding protein and 
characterization of their encoded nucleic acid sequences. 

Oligonucleotide Synth sis 
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Oligonucleotides w re synthesiz d as described in 
Example I. The synth sizer was programmed to synthesiz 
th sequences shown in Tabl IX • These s guenc s 
correspond to the first random codon position synthesized 
and 3' flanking sequences of the oligonucleotide which 
hybridizes to the leader sequence in the vector. Th 
complementary sequences are used for insertional 
mutagenesis of the synthesized population of 
oligonucleotides • 



10 



Table IX 



15 



20 



Column 




column 


1 


column 


2 


column 


3 


column 


4 


column 


5 


column 


6 


column 


7 


column 


8 


column 


9 


column 


10 



Sequence 15' to 3 M 
AA ( A/C ) GGTT66TCG6TACCGG 
AG ( A/G ) GGTTGGTCGGTACCGG 
AT (A/G ) GGTTGGTCGGTACCGG 
AC (A/G ) GGTTGGTCGGTACCGG 
CA ( G/T ) GGTTGGTCGGTACCGG 
CT ( G/C ) GGTTGGTCGGTACCGG 
AG ( T/C ) GGTTGGTCGGTACCGG 
AT ( T/C ) GGTTGGTCGGTACCGG 
CC (A/C ) GGTTGGTCGGTACCGG 
T ( A/T ) TGGTTGGTCGGTACCGG 



The next eight random codon positions were 
synthesized as described for Table V in Example I. 
Following the ninth position synthesis, the reaction 
25 products were once more combined r mixed and redistributed 
into 10 new reaction columns. Synthesis of the last random 
codon position and 5' flanlcing sequences cure shown in Table 
X. 

Table X 



30 



Column 
column 1 
column 2 
column 3 



Sequence (5^ to 3M 
AGGATCCGCCGAGCTCAA( A/C ) A 
AGGATCCGCCGAGCTCA6(A/G)A 
AG6ATCCGCCGAGCTCAT ( A/G ) A 
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column 4 AGG&TCCGCCGAGCTCAC(A/G) A 

column 5 AGGATCCGCCGAGCTCCA(G/T) A 

column 6 AGGATCCGCCGAGCTCCT(G/C)A 

column 7 AGGATCCGCCGAGCTCAG(T/C)A 

column 8 AGGATCCGCCGAGCTCAT(T/C) A 

column 9 AGGATCCGCCGAGCTCCC(A/C) A 

column 10 AGGATCCGCCGAGCTCT(A/T)TA 



The reaction products were mixed once more and 
the oligonucleotides cleaved and purified as recommended by 
10 the manufacturer. The purified population of 

oligonucleotides were used to generate a surface expression 
library as described below. 



Vector Construction 



The vector used for generating surface expression 
15 libraries from a single oligonucleotide population (i.e., 
without joining together of right and left half 
oligonucleotides) is described below. The vector is a M13- 
based expression vector which directs the synthesis of gene 
VI I I -peptide fusion proteins (Figure 4). This vector 
20 exhibits all the functions that the combined right and left 
half vectors of Example I exhibit. 

An Hl3-ba8ed vector was constructed for the 
cloning and surface expression of populations of random 
oligonucleotides (Figure 4, M13IX30), M13mpl9 (Pharmacia) 

25 was the starting vector. This vector was modified to 
contain, in addition to the encoded wild type Ml 3 gen 
VIII: (1) a pseudcwild type gene, gene VIII sequence with 
an amber stop codon placed between it and the restriction 
sites for cloning oligonucleotides; (2) Stu I, Spe I and 

30 Xho I restriction sites in frame with the pseudo-wild type 
gVIII for cloning olig nucl otid s; (3) sequences necessary 
for expression, such as a promoter, signal sequence and 
translation initiation signals; (4) various other mutations 
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to remove r dundant restriction sites and the amino 
terminal portion of Lac Z. 

Construction of M13IX30 was performed in four 
steps. In the first step, a precursor vector containing 
5 the pseudo gene VIII and various other mutations was 
constructed, M13IX01F. The second step involved the 
construction of a small cloning site in a separate M13mpl8 
vector to yield M13IX03. In the third step, expression 
sequences and cloning sites were constructed in M13IX03 to 
10 generate the intermediate vector M13IX04B. The fourth step 
involved the incorporation of the newly constructed 
sequences from the intermediate vector into Ml 31X0 IF to 
yield M13IX30. Incorporation of these sequences linked 
them with the pseudo gene VIII. 

15 Construction of the precursor vector M13IX01F was 

similar to that of M13IX42 described in Example I except 
for the following features: (1) M13mpl9 was used as th 
starting visctor; (2) the Fok I site 5' to the unique Eco 
RI site was not incorporated and the overhang at the 

20 naturally occurring Fok I site at position 3547 was not 
changed to 5'-CTTC-3'; (3) the spacer sequence was not 
incorporated between the Eco RI and Sac I sites; and (4) 
the amber codon at position 4492 was not incorporated. 

In the second step, M13mpl8 was mutated to remove 
25 the 5' end 6f Lac Z up to the Lac i binding site and 
including the Lac Z ribosome binding site and start codon. 
Additionally, the polylinker was removed and a Mlu I site 
was introduced in the coding region of Lac Z. A singl 
oligonucleotide was used for these mutagenesis and had the 
3 0 sequence " 5 ' -AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC- 
3'*' (SEQ ID NO: 41). Restricti n nzyme sites for Hind III 
and Eco RI w r introduced downstream of th Mlul site 
using the oligonucleotide "5'- 
GGCGAAAGGGAATTCTGCAAGGCGATTAAGCTTGGGTAACGCC-3 ' " ( SEQ ID NO : 
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42). These modifications of Ml3ii^l8 yi Ided the vector 
M13IX03 . 

The expression sequences and cloning sites were 
introduced into M13IX03 by chemically synthesizing a series 
of oligonucleotides which encode both strands of the 
desired sequence. The oligonucleotides are presented in 
Table XI (SEQ ID NOS: 43 through 50). 

TABLE XI 
M13IX30 Oligonucleotide Series 



10 Top Strand 

Oligonucleotides 

084 

027 



15 



028 



029 



Sequence f 5 ^ to 3*) 

6GCGTTACCCAAGCTTT6TACAT6GAGAAAATAAAG 

TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT 
TACC6T 

TACTGTTTACCCCT6T6ACAAAAGCC6CCCAGGTCC 
A6CTGC 

TCGAGTCA6GCCTATT6T6CCCA6GGATTGTACTAG 
TGGATCCG 



Bottom 
20 Oligonucleotides 

085 

031 



25 



032 
033 



Sequence (5* to 3 ^ ) 

TGGCGAAAGGGAATTCGGATCCACTA6TACAATCCCTG 

GGCACAATAG6CCTGACTCGA6CA6CTGGACCAGGGCG 
GCTT 

TTGTCACAGGGGTAAACA6TAACGGTAACGGTAAGTGT 
6CCA 

GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT 
ACAA 



The above oligonucleotides except for the 
t rminal oligonucleotides 084 (SEQ ID NO: 43) and 085 (SEQ 
30 ID NO: 47) of Tabl XI were mix d, phosphorylated, ann al d 
and ligat d to form a double stranded ins rt as described 
in Example I. How ver, inst ad of cloning directly into 
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the Interm diate vector the insert was first amplified by 
PCR using the terminal oligonucl otid s 084 (SEQ ID NO: 43) 
and 085 (SEQ ID KO: 47) as primers. Th t rminal 
oligonucleotide 084 (SEQ ID NO: 43) contains a Hind III 
5 site 10 nucleotides internal to its 5' end. 
Oligonucleotide 085 (SEQ ID NO: 47) has an Eco RI site at 
its 5' end. Following amplification, the products were 
restricted with Hind III and Eco RI and ligated as 
described in Example I into the poly linker of M13mpl8 

10 digested with the same two enzymes. The resultant double 
stranded insert contained a ribosome binding site, a 
translation initiation codon followed by a leader sequence 
and three restriction enzyme sites for cloning random 
oligonucleotides (Xho I, Stu I, Spe I). The vector was 

15 named H13IX04. 

During cloning of the double- stranded insert, it 
was found that one of the 6CC codons in oligonucleotides 
028 and its complement in 031 was deleted. Since this 
deletion did not affect function, the final construct is 

20 missing one of the two GCC codons* Additionally, 
oligonucleotide 032 contained a 6T6 codon where a GAG codon 
was needed. Mutagenesis was performed using the 
oligonucleotide 5' -TAACGGTAAGAGTGCCAGTGC-3' (SEQ ID NO: 51) 
to convert the codon to the desired sequence. Th 

25 resultant intermediate vector was named M13IX04B. 

The fourth step in constructing Ml 3 1X30 involved 
inserting the expression and cloning sequences from 
M13IX04B upstream of the pseudo-wild type gVIII in 
M13IX01F. This was accomplished by digesting M13IX04B with 

30 Dra III and Ban HI and gel isolating the 700 base pair 
insert containing the sequences of interest. M13IX01F was 
likewise digested with Dra III and Bcun HI. The insert was 
c mbin d with th doubl digest d v ctor at a molar ratio 
of 3:1 and ligat d as d scrib d in Exampl I. It should be 

35 not d that all modifications in th v ctors described 
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herein w re confirmed by segu nc analysis. The seguenc 
of the final construct, M13IX30, is shown in Figure 7 (SEQ 
ID NO: 3). Figure 4 also shows M13IX30 wher ach of the 
elements necessary for surface expression of randomized 
5 oligonucleotides is marked. 

Library Construction, Screening and Characterization of 
Encoded Oligonucleotides 

Construction of an M13IX30 surface expression 
library is accomplished identically to that described in 

10 Example I for sublibrary construction except the 
oligonucleotides described above are inserted into M13IX30 
by mutagenesis instead of by ligation. The library is 
constructed and propagated on MK30-3 (BHB) and phage stocks 
are prepared for infection of XLI cells and screening. The 

15 surface expression library is screened and encoding 
oligonucleotides cheiracterized as described in Example I. 

EXAMPLE III 

Isolation and Characterization of Peptide Ligands 
Generated from Right and Left Half 
20 Degenerate Oligonucleotides 

This example shows the construction and 
expression of a surface expression library of degenerate 
oligonucleotides. The encoded peptides of this example 
derive from the mixing and joining together of two separate 
25 oligonucleotide populations. Also demonstrated is the 
isolation and characterization of peptide ligands and their 
corresponding nucleotide seguence for specific binding 
proteins • 

Synthesis of Oligonucleotide Populations 
30 A population of left half degenerate 
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oligonucleotides and a population of right half degenerate 
lig nucleotid s was synthesized using standard automated 
proc dures as described in Example !• 

The degenerate codon sequences for each 
5 population of oligonucleotides were generated by 
sequentially synthesizing the triplet NNG/T where N is an 
equal mixture of all four nucleotides. The antisense 
sequence for each population of oligonucleotides was 
synthesized and each population contained 5' and 3' 

10 flanking sequences complementairy to the vector sequence* 
The complementary termini was used to incorporate each 
population of oligonucleotides into their respective 
vectors by standard mutagenesis procedures. Such 
procedures have been described previously in Example I and 

15 in the Detailed Description. Synthesis of the antisense 
sequence of each population was necessary since the single- 
stranded form of the vectors are obtained only as the sense 
strand. 

The left half oligonucleotide population was 
20 synthesized having the following sequence: 5'- 
AGCTCCCGGATGCCTO^GaU^GATG (A/CNN )5GGCTTTTGCCi«aVGa (SEQ ID 

NO: 52). The right, half oligonucleotide population was 
synthesized having the following sequence: 5'*- 
CAGCCTCGGATCCGCC (A/CNN )ioATG(A/C)GAAT-3' (SEQ ID NO. 53). 
25 These two oligonucleotide populations when incorporated 
into their respective vectors and joined together encode a 
20 codon oligonucleotide having 19 degenerate positions and 
an internal predetermined codon sequence. 

Vector Constjruction 

30 Modified forms of the previously described 

vectors wer used for the construction of right and left 
half sublibrari s. The constaniction of 1 ft half 
sublibraries was performed in an Ml3-bas d vector termed 
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M13ED03. This vector is a m dified form f the pr viously 
d scribed H13IX30 vector and c ntains all the essential 
features of both M13IX30 and M13IX22. M13ED03 contains, in 
addition to a wild type and a pseudo-wild type gene VZII, 
5 sequences necessary for expression and two Fok I sites for 
joining with a right half oligonucleotide sublibraz^. 
Therefore, this vector combines the advantages of both 
previous vectors in that it can be used for the generation 
and expression of surface expression libraries from a 
10 single oligonucleotide population or it can be joined with 
a sublibrary to bring together right and left half 
oligonucleotide populations into a surface expression 
library. 

H13ED03 was constructed in two steps from 
15 H13IX30. The first step involved the modification of 
Hi 3 1X30 to remove a redundant sequence and to incorporate 
a sequence encoding the eight amino-terminal residues of 
human fi-endorphin. The leader sequence was also mutated to 
increase secretion of the product. 

20 During construction of M13IX04 (an intermediate 

vector to H13IX30 which is described in Example II), a six 
nucleotide sequence was duplicated in oligonucleotide 027 
(SEQ ID HO: 44) and its complement 032 (SEQ ID NO: 49). 
This sequence, 5'-TTACCG-3' , was deleted by mutagenesis in 

25 the construction of M13ED01. The oligonucleotide used for 
the mutagenesis was 5'-66TAAACAGTAAC6GTAA6A6TGCCAG-3' (SEQ 
ID NO: 54). The mutation in the leader sequence was 
generated using the oligonucleotide 5'-GGGCTTTTGCCACAGG6GT- 
3' (SEQ ID NO: 55). This mutagenesis resulted in the A 

30 residue at position 6353 of M13IX30 being changed to a G 
residue. The resultant vector was designated H13IX32. 

To generat H13ED01, th nucl otid sequence 
encoding ndorphin (8 amino acid residues of fi-endorphin 
plus 3 xtra amino acid r sidues) was incorporat d aft r 
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the leader sequence by mutagenesis. The oligonucleotide 
us d had the following seguenc : 5'- 
AGGGTCATCGCCTTCAGCTCCGGATCCCTCAGAAGTCATAAACCCCCCATAGGC 
TTTTGCCAC-3' (SEQ ID NO: 56 )• This mutagenesis also 
5 removed some of the downstream sequences through the Spe I 
site. 

The second step in the construction of M13ED03 
involved vector changes which put the fi~endorphin sequence 
in frame with the downstream pseudo-gene VIII sequence and 
incorporated a Fok I site for joining with a sublibrary of 
right half oligonucleotides. This vector was designed to 
incorporate oligonucleotide populations by mutagenesis 
using sequences complementary to those flanking or 
overlapping with the encoded fi-endorphin sequence. The 
absence of B*-endorphin expression after mutagenesis can 
therefore be used to measure the mutagenesis frequency. In 
addition to the above vector changes, H13ED03 was also 
modified to contain an amber codon at position 3262 for 
biological selection during joining of right and left half 
sublibraries • 

The mutations were incorporated using standard 
mutagenesis procedures as described in Example I. The 
frame shift changes and Fok I site were generated using the 
oligonucleotide. 5'- 
25 TCGCCTTCAGCTCCCGGATGCCTCAGAAGCATGAACCCCCCATAGGC-3' (SEQ ID 
NO: 57). The amber codon was generated using the 
oligonucleotide 5'-CAATTTTATCCTAAATCTTACCAAC-3' (SEQ ID NO: 
58). The full sequence of the resultant vector, M13ED03, 
is provided in Figure 8 (SEQ ID NO: 4). 

30 The construction of right half oligonucleotide 

sublibraries was performed in a modified form of the 
M13IX42 vector. The new v ctor, M13IX421, is identical to 
M13IX42 exc pt that the amber codon betw en the Eco Rl-SacI 
cloning site and the ps udo-gene VIII sequence was removed. 



10 
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This Chang nsures that all xpression off of the Lac Z 
promoter produc sap ptide-gene VIII fusion protein. 
Removal of the amber codon was performed by mutagenesis 
using the following oligonucleotide: 5'- 
5 GCCTTCAGCCTCGGATCCGCC-3 ' (SEQ ID NO: 59). The full 
sequence of H13IX421 is shown in Figure 9 (SEQ ID NO: 5). 

Library Construction, Screening and Characterization of 
Encoded Oligonucleotides 

A sublibrary was constructed for each of the 

10 previously described degenerate populations of 
oligonucleotides • The left half population of 

oligonucleotides was incorporated into H13ED03 to generate 
the sublibrary H13ED03.L and the right half population of 
oligonucleotides was incorporated into M13IX421 to generate 

15 the sublibr€u:y M13IX421.R. Each of the oligonucleotide 
populations were incorporated into their respective vectors 
using site-directed mutagenesis as described in Example I. 
Briefly, the nucleotide sequences flanking the degenerate 
codon sequences were complementary to the vector at the 

20 site of incorporation. The populations of nucleotides were 
hybridized to single- stranded M13ED03 or M13IX421 vectors 
and extended with T4 DNA polymerase to generate a doubl - 
stranded circular vector. Mutant tes^lates were obtain d 
by uridine selection in vivo , as described by Kunkel et 

25 al., supra . Each of the vector populations were 
electroporated into host cells and propagated as describ d 
in Example I» 

The random joining of right and left half 
sublibraries into a single surface expression library was 
30 accomplished as described in Example I except that prior to 
digesting each vector population with Fok I they were first 
dig sted with an nzyme that cuts in the unwanted portion 
of each vector. Briefly, M13ED03.L was digest d with Bgl 
II (cuts at 7094) and M13IX421.R was digested with Hind III 
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(cuts at 3919). Each of the digested populations were 
further treated with alkaline phosphatas to ensure that 
the ends would not religate and then dig sted with an 
excess of Fok I. Ligations, electroporation and 
5 propagation of the resultant library was performed as 
described in Example I. 

The surface expression library was screened for 
ligand binding proteins using a modified panning procedure. 
Briefly, 1 ml of the library, about 10" phage particles, 
was added to 1-5 pg of the ligand binding protein. The 
ligand binding protein was either an antibody or receptor 
globulin (Rg) molecule, Aruffo et al.. Cell 61:1303-1313 
(1990), which is incorporated herein by reference. Phage 
were incubated shaking with affinity ligand at room 
temperature for 1 to 3 hours followed by the addition of 
200 pil of 1 im latex beads (Biosite, San Diego, CA) which 
were coated with goat-antimouse IgG. This mixture was 
incubated shaking for an additional 1-2 hours at ro m 
temperature. Beads were pelleted for 2 minutes by 
centrif ugation in a microf uge and washed with TBS which can 
contain 0.1% Tween 20. Three additional washes were 
performed where the last wash did not contain any Tween 20. 

Beads containing bound phage were added to plates at 
25 a concentration that produces a suitable density for plagu 
identification screening and sequencing of positive clones 
(i.e., plated at confluency for rare clones and 200-500 
plaques /plate if pure plaques were needed). Briefly, 
plaques grown for about 6 hours at 37 were overlaid with 
30 nitrocellulose filters that had been soaked in 2 mM IPTG 
and briefly dried. The filters remained on the plaques 
overnight at room temperature , removed and placed in 
blocking solution for 1-2 hours. Following blocking, the 
f ilt rs w r incubat d in 1 ;ig/ml ligand binding protein in 
35 blocking solution for 1-2 hours at room temperature. Goat 
antimouse Ig-coupled alkaline phosphatase (Fisher) was 
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added at a 1:1000 dilution and the filters wer rapidly 
wash d with 10 mis of TBS or block solution over a glass 
vacuum filt r. Positiv plagues were id ntified aft r 
alkaline phosphatase development for detection. 

5 Alternatively, the bound phage were eluted from th 

beads using 200 fil 0.1 M Glycine-HCl, pH 2.2, for 15 
minutes and the beads were removed by centrifugation. Th 
supernatant containing phage (eluate) was removed and phage 
exhibiting binding to the ligand binding protein wer 

10 further enriched by one to two more cycles of panning. Th 
eluates were screened by plaque formation, as described 
above. Typical yields after the first eluate were about 1 
X 10* - 5 X 10* pfu. The second and third eluate generally 
yielded about 5 x 10* - 2 x 10^ pfu and 5 x 10^ - 1 x 10" 

15 pfu, respectively. 

Screening of the degenerate oligonucleotide 
library with several different ligand binding proteins 
resulted in the identification of peptide sequences which 
bound to each of the ligands. For example, screening with 

20 an antibody to B-endorphin resulted in the detection of 
about 30-40 different clones which essentially all had the 
core amino acid sequence known to interact with the 
antibody. The sequences flanking the core sequences were 
different showing that they were independently derived and 

25 not duplicates of the same clone. Screening with an 
antibody known as 57 gave similcir results (i.e., a core 
consensus sequence was identified but the flanking 
sequences among the clones were different) . 

EXAMPLE IV 

30 Generation of a Left Half Random Oligonucleotide Library 

This exampl shows the synth sis and construction 
of a left half random oligonucleotide library. 
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A population of random oligonucleotides nine 
codons in length was synthesized as described in Example I 
except that different s guenc s at th ir 5' and 3' ends 
were synthesized so that they could be easily inserted into 
5 the vector by mutagenesis. Also, the mixing and dividing 
steps for generating random distributions of reaction 
products was performed by the alternative method of 
dispensing equal volumes of bead suspensions. The liquid 
chosen that was dense enough for the beads to remain 
10 dispersed was 100% acetonitrile. 

Briefly, each column was prepared for the first 
coupling reaction by suspending 22 mg (l^ole) of 48 /jmol/g 
capacity beads (Genta, San Diego, CA) in 0.5 mis of 100% 
acetonitrile. These beads are smaller than those described 

15 in Example I and are derivatized with a guanine nucleotide. 
They also do not have a controlled pore size. The bead 
suspension was then transferred to an empty reaction 
column. Suspensions were kept relatively dispersed by 
gently pipetting the suspension during transfer. Columns 

20 were plugged and monomer coupling reactions were performed 
as shown in Table XII • 

Table XII 





Coliimn 




Sequence 
(5' to 3M 


25 


column 


IL 


AA ( A/C ) G6CTTTTGCCACAGG 




column. 


2L 


AG ( A/G ) GGCTTTT6CCACAGG 




column 


3L 


AT (A/G ) GGCTTTTGCCACAGG 




column 


4L 


AC (A/G ) GGCTTTTGCCACAGG 




column 


5L 


CA ( G/T ) GGCTTTTGCCACAGG 


30 


column 


6L 


CT ( G/C ) GGCTTTTGCCACAGG 




column 


7L 


AG ( T /C ) GGCTTTTGCCACAGG 




column 


8L 


AT ( T /C ) GGCTTTTGCCACAGG 




column 


9L 


CC (A/C ) GGCTTTTGCCACAGG 




column 


lOL 


T ( A/T ) TGGCTTTTGCCACAGG 
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After coupling of th last monomer, the columns 
were unplugged as d scribed previously and th ir contents 
were poured into a 1.5 ml microfug tube* Th columns were 
rinsed with 100% acetonitrile to recover any remaining 
5 beads. The volume used for rinsing was determined so that 
the final volume of total bead suspension was about 100 ^1 
for each new reaction column that the beads would b 
aliguoted into. The mixture was vortexed gently to produce 
a uniformly dispersed suspension and then divided, with 

10 constant pipetting of the mixture, into equal volximes. 
Each mixture of beads was then transferred to an empty 
reaction column. The empty tubes were washed with a small 
volume of 100% acetonitrile and also transferred to their 
respective columns. Random codon positions 2 through 9 

15 were then synthesized as described in Example I where the 
mixing and dividing steps were performed using a suspension 
in 100% acetonitrile. The coupling reactions for codon 
positions 2 through 9 are shown in Table ZZII. 

Table XIII 

Sequence 
(5' to 3M 

AA(A/C)A 
AG(A/G)A 
AT(A/G)A 
AC(A/G)A 
CA(G/T)A 
CT(G/C)A 
AG(T/C)A 
AT(T/C)A 
CC(A/C)A 
T(A/T)TA 



20 



Column 



column IL 
column 2L 
column 3L 

25 column 4L 

column 5L 
column 6L 
column 7L 
column 8L 

30 column 9L 

column lOL 



Aft r coupling of the last monomer for th ninth 
codon position, the reaction products were mixed and a 
portion was transferred to an empty reaction column. 
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Columns w re plugged and the following monomer coupling 
reactions were performed : 5 ' -CGGATGCCTCAGAAGCCCCXXA-3 ' 
. (SEQ ID KO: 60). The resulting population of random 
oligonucleotides was purified and incorporated by 
5 mutagenesis into the left half vector M13ED04. 

H13ED04 is a modified version of the M13ED03 
vector described in Example III and therefore contains all 
the features of that vector. The difference between 
M13ED03 and H13ED04 is that M13ED04 does not contain th 

10 five amino acid sequence (Tyr Gly Gly Phe Met) recognized 
by anti-fi--endorphin antibody. This sequence was deleted by 
mutagenesis using the oligonucleotide 5"^-* 
CGGATGCCTCAGAA6GGCTTTTGCCACAGG (SEQ ID NO: 61). The entir 
nucleotide sequence of this vector is shown in Figure 10 

15 (SEQ ID NO: 6) . 

EXAMPLE V 

Generation of Soluble, Conformationally Constrained 

Random Peptides 

This example shows the synthesis and construction 
20 of expressible oligonucleotides encoding soluble peptides 
having a constrained secondary structure in solution. 

As noted previously, the binding affinity of a 
peptide for a ligand-binding protein is a function of the 
primary and secondary structure of the peptide. The effect 
25 of primary structure on affinity may be determined as 
disclosed in the above examples. 

In its broadest form, the disclosed method 
provides oligonucleotides that are synthesized having a 
desired bias of pr d termined codons such that the 
30 oligonucleotides encode peptides having a constrain d 
s condary structure in aqueous solution. In a pref rred 
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exnb diment, oligonucleotides ncoding p ptid s having a 
constrained s condeury structur are synthesized having a 
d sired bias of pr d termin d codons such that the 
predetermined codons are separated by at least one random 
5 codon* 

Oligonucleotides having more than one tuplet 
encoding an amino acid capable of forming a covalent bond 
at a predetermined position and the remaining positions 
having random tuplets are synthesized using the methods 

10 described herein. The synthesis steps are similar to those 
outlined above using twenty or less reaction vessels except 
that prior to synthesis of the specified codon position, 
the dividing of the supports into separate reaction vessels 
for synthesis of different codons is omitted. For example, 

15 if the codon at the second position of the oligonucleotide 
is to be specified, then following synthesis of random 
codons at the first position and mixing of the supports, 
the mixed supports are not divided into new reaction 
vessels but, instead, are contained in a single reaction 

20 vessel to synthesize the specified codon. The specified 
codon is synthesized sequentially from individual monomers 
as described above. Thus, the number of reaction vessels 
is increased or decreased at each step to allow for the 
synthesis of a specified codon or a desired number of 

25 random codons. 

Alternatively, a population of random left and 
right precursor oligonucleotides are synthesized 
essentially as described in Example I, except that at least 
one predetermined codon encoding cysteine, lysine, glutamic 

30 acid, leucine or tyrosine is incorporated into each 
oligonucleotide. Combination of right and left 

oligonucleotides results in a single oligonucleotide 
containing at least two pred termin d codons. 
Alternatively, a population of random oligonucleotides is 

35 synth siz d as described in Example II, except that at 
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least two predetenained codons encoding cysteine, lysine, 
glutamic acid, leucine or tyrosine ar incorporated into 
only one of th two precursor oligonucleotide populations. 

Following expression of the oligonucleotides, a 
5 peptide having a constra±ned secondary structure is 
obtained by allowing the formation of at least one intra* 
peptide covalent bond. One skilled in the art would know 
the conditions necessary to allow formation of the 
particular covalent bond. See, for example, Proteins > 

10 Structures and Molecular Principles , Creighton, T.E. ed,, 
W.H, Freeman and Co., New York (1984), incorporated herein 
by reference. Although oligonucleotides can encode 
peptides capable of forming more than one intra-peptide 
covalent bond, only one such bond is necessaz^ to form a 

15 conf ormationally-constrained peptide. 

The peptide libraries are expressed on the 
surface of a cell, for example, a bacteriophage. Phage 
expressing peptide ligeuids are initially identified by 
panning, essentially as described in Example I, except that 
20 the phage are first incubated in the presence of a ligand- 
binding protein (in this example, an antibody), then panned 
in protein A-coated dishes. Individual phage populations 
are purified through three rounds of plague purification, 
essentially as described in Example I. 

25 Two phage encoding peptides showing significantly 

higher ligand binding affinity than the general phage 
population are isolated, the oligonucleotide sequences are 
determined and the amino acid sequences deduced. The 
ligand binds with highest affinity to a twenty- two amino 

30 acid peptide having the sequence TQSKCSTDHWLGYIEYFIMCTY 
(SEQ. ID. NO.: 62). The ligand also binds with high 
affinity to a peptide having th s quence 
CDDQYYTDHEQGKCEVALYYTG (SEQ. ID. NO. : 63). 
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Th above-identified peptides are each capable of 
forming several intra-peptid covalent bonds. For example, 
a disulfide bond may form between two cysteine residues, a 
6 (yglutamyl) "-lysine bond may form between lysine and 
5 glutamic acid residues, a lysinonor leucine bond may form 
between lysine and leucine residues or a dityrosine bond 
can form between two tyrosine residues (Devlin, Textbook of 
Biochemistry 3d ed. (1992)). In addition, other peptides 
can be constructed that contain, for example, four lysine 
10 residues, which can form the heterocyclic structure of 
desmosine. 



The nature of the covalent bond in the peptide 
having the sequence TQSKCSTDHWLGYIEYFIMCTY (SEQ. ID. NO.: 
62) is detesnoined by examining the effect of amino acid 
15 substitutions on the binding affinity of the ligand, by 
methods known to those skilled in the art, and described 
herein. Creighton, supra , pp. 335-396, incorporated herein 
by reference. 

The oligonucleotide encoding this peptide is 
20 cloned into a vector that allowed secretion of the 
expressed peptide . The peptide TQSKCSTDHWLGYIEYFIMCTY 
(SEQ. ID. NO.: 62) is soluble at a concentration of 4 
mg/ml. The same peptide, except containing the 

substitution of alanine for cysteine is insoluble at this 
25 concentration. 



EXAMPLE VI 

Binding Studies Using Conf ormationallv Constrained 

Peptides 

The association constant (K^) / dissociation 
30 constant (K^) and affinity constant (K) were determined for 
the r action of a monoclonal antibody with the linear or 
the cyclized form of a peptide, using a BIAcore automated 
biosensor (Pharmacia Biosensor AB, Uppsala, Swed n), as 
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d scrib d by Karlsson t al., J. Immunol. M th« 145:229-240 
(1991). A 24 amino acid peptide^ TQSKCSTDHWLGyiEYFIMCTYRR 
(SEQ. ID. NO.: 64), which is recognized by the J2B9 
monoclonal antibody, was used for these experiments. The 
5 peptide contains two cysteine residues that form a 
disulfide bond under oxidizing conditions. 

The cyclized form of the peptide was immobilized 
by its amino terminus to the BIAcore sensor chip and 
exposed to 0.016, 0.033, 0.066, 0.13 or 2.3 nM solutions of 
10 the J2B9 antibody. Changes in refractive index were 
measured and the formulas described by Karlsson et al., 
supra > were used to calculate the following rate and 
affinity constants: = 3.7 x 10= JrV; = 4.5 x lO'* 

sec-^ and K = 8.4 X 10® M. 

15 After the above-described measurements were 

obtained, the disulfide bond was reduced by treating th 
cyclized peptide with 10 mM dithiothreitol , while the 
peptide was still attached to the BIAcore sensor chip. The 
dissociation rate of the linear peptide and the J2B9 

20 monoclonal antibody was then determined, as described 
above • 



The dissociation rate of the J2B9 antibody and 
the linear peptide was calculated to be 1.54 x 10'^ sec. 
Thus, the antibody dissociated from the linear peptide 

25 three times faster than it dissociated from the cyclized 
peptide. Reoxidation of the linearized peptide to reform 
the cyclized peptide resulted in the dissociation rate 
again decreasing to the 10*^ range. These results show that 
a conf ormationally constrained peptide binds a specific 

30 receptor with greater affinity than a peptide with a less 
stable secondary structure. 



wo 94/1 1496 



75 

EXAMPLE VII 



PCr/US93/108S0 



Soluble, Conf ormationallv-'Constrained Random Peptides 
Having High Affinity to An Anti-Tetanus Toxin Antibody 

This example shows the synthesis and construction of 
5 expressible random oligonucleotides encoding soluble 
peptides with constrained secondary structures and the 
selection of high affinity binders to an anti-tetanus 
toxin antibody* 

Oligonucleotide Synthesis 

10 Random oligonucleotides of ten codons in length were 

synthesized as right and left half precursors essentially 
as described in Example I. When combined, they yield an 
oligonucleotide coding for twenty amino acid long random 
peptides. Codons for cysteine were used to produce 

15 peptides with a potential for forming covalent bonds for 
secondary structure constraints. In contrast to that 
described in Example V where the amino acids used for 
cyclization of the peptides were placed at predetermined 
positions, the cysteine codons were introduced at all 

20 positions with a predetermined bias compared to the other 
nineteen random codons. 

Briefly, ten reaction vessels were used for the 
synthesis of twenty random codons at each codon position 
essentially as described in Example I. In addition to 

25 the normal ten reaction vessels used for synthesis, an 

extra two reaction vessels were used for the synthesis of 
the two cysteine codons, TGC and TGT. Thus, the 
synthesis procedure used a total of twelve reaction 
vessels for the synthesis of each codon position where 

30 the frequency of cysteine cod ns at each position is 

twenty percent. The 5' and 3' flanking sequences for the 
right and left half oligonucleotides were thos described 
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in Example I. The use of the extra two vessels encoding 
cysteine residues r suits in the increased frequency of 
cyst ine being incorporated at each codon position. This 
increased frequency insures the presence of residues 
5 capable of forming covalent bonds for constraining the 
peptide's secondary structure. Moreover, the random 
incorporation of cysteines at each of the codon 
positions, instead of incorporation at predetermined 
positions, increases the probability of obtaining 
10 peptides with a constrained conformation and, thus, a 
high affinity toward a binding protein since a greater 
number of peptides are available to screen. 

Library Construction and Screening 

Library construction from right and left half 
15 oligonucleotides were generated as described in Example 
I. The libraries were screened for peptides that bind to 
an anti-tetanus toxin antibody essentially as described 
in Example III. After two rounds of panning, eight phage 
clones were selected that showed high affinity binding to 
20 the antibody. Sequencing of the encoding nucleic acids 
revealed seven peptides having cysteines spaced at ten 
residues apart and one peptide having cysteines were 
seven residues apart. The sequences are shown in Table 
XIV and are listed in the sequencing listing as SEQ ID 
25 NOS: 65 through 72. 
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Tabl XIV 

Conformationally Constrained Peptides Having High 
Affinity for Anti-Tet6mus Toxin Antibody 

SEP ID NO! PEPTIDE SEQUENCE 



65 


TCLREEFILQCYIVMIEDWY 


66 


ICEHHQMLLQCSLVCEECMM 


67 


KCIIGWYTLTCYMSDRPRME 


68 


ACTQDMNWITCPMYCEVLCF 


69 


VCFYFPFKMMCHMEYIAYEY 


70 


DANCGHCTYMCICKIMYYIS 


71 


WHRHVSSFMSCWWYDQCAVA 


72 


CVQIDFFTVQCNISSHMFLP 



Although the invention has been described with 
reference to the presently preferred embodiment, it 
15 should be understood that various modifications can be 
made without, departing from the spirit of the invention. 
Accordingly, the invention is limited only by the claims. 
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SEQUEHCE LISTING 



(1) GEKERAL INFOKH21TION : 

(i) APPLICANT: ixsys, INC. 

(ii) TITLE OF INVENTION: Soluble Peptides Having constxained, 

secondary conformation in Solution and Method of Making 
Same. 

(iii) NUMBER OF SEQUENCES: 72 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Campbell and Flores 

(B) STREET: 4370 La Jolla Village Drive, Sxiite 700 

(C) CITY: San Diego 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 92122 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: lO-NOV-1993 

(C) CLASSIFICATION: 

(Vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 07/976,893 

(B) FILING DATE: 10-*NOV-1992 

(Viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Ronski, Antoinette F. 

(B) REGISTRATION NUMBER: 34,202 

(C) REFERENCE/DOCKET NUMBER: FP*IX 9769 

(iX) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: <6I9) 535-9001 

(B) TELEFAX: (619) 535-8949 



(2) INFORMATION FOR SEQ ID NOsl: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT €Q 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
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TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGC7CGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAOGGTAA&G 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATICAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AA&CATTTTA 


CTATTACCCC 


CTCTGGCAAA ACTTCTTTTG 


CAAAAGCCTC 


TCGCXATTTT 


600 


OGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AkTTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


6TATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTXATTAA 


CGTAGATTTT 


780 


TCTTCCCA&C 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAAIGATTAA 


AGTTGAAATT 


AAACCATC7C 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTC6TCAGGG 


CAAGCCTTAT 


TCACTGAATG AGCAGCTTTC 


TTACGTTGAT 


TTGGGTAATC 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTAGAC06T 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTCACC 


1080 


GTCTCC6CCT 


C6TTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTCTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG AAACTTCCTC 


ATGA&AAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTC 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GGAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TCCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TCTTTAAGAA 


1500 


ATTCACCTCG 


AAAGGAAGCT 


GATAAACCGA 


7ACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TCTTCCTTTC 


1620 


TATTCTCAC7 


CCGCTGAAAC 


TGTTCAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 


TTTACTAACG 


TCTGGAAAGA 


CGAGAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTGTGGAATG 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


7TACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC 


TATCCCTGAA AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


AOTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAAC7G 


TTXATACGGG 


CACTGTTACT 


2100 


CAAGGCACTG 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


CTGTATCATC 


AAAAGCCATG 


2160 


TATGACGCTT 


ACTGGAACGG 


TAAATTCAGA 


GACTGCGCTT 


TCCATTCTGG 


CTTTAATGAA 


2220 


GATCCATTCG 


TTTGTGAATA 


TCAAGGCCAA 


TCGTCTGACC 


TGCCTCAACC 


TCCTGTCAAT 


2280 


GCTGGCGGCG 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGCTCTG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 
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GGOGGTTCTG 


AGGGTGGCGG 


CTCTGAGGGA 


GGCGGTTCCG 


GTGGTGGCTC 


TGGTTCCGGT 


2400 


GATTTTGATT 


ATGAAAAGAT 


GGGAAACGCT 


AATAAGGGGG 


CTATGACCGA 


AAATGCCGAT 


2460 


G&&A&C6CGC 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTGATT 


CTGTCGCTAC 


TGATTACGGT 


2520 


GCTGCTATCG 


ATGGTTTCAT 


TGGTGACGTT 


TCCGGCCTTG 


CTAATGGTAA 


TGGTGCTACT 


2580. 


GGTGATTTTG 


CTGGCTCTAA 


TTCCCAAATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATGAATA 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700- 


TTTGTCTTTA 


GC6CTGGTAA 


ACCATATGAA 


TTTTCTATTG 


ATTGTGACAA 


AATAAACTTA 


2760 


TTCOGTGGTG 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTATGTATGT 


ATTTTCTACG 


2820 


TTTGCTAACA 


TACTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 


TTAAAAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


CGCTCAATTA 


CCCTCTGACT 


3060 


TTGTTCAGGG 


TCTTCAGTOA 


ATTCTCCCGT 


CTAATGCGCT 


TCCCT6TTTT 


TATGTtTATTC 


3120 


TCTCTGTAAA 


GGCTGCTATT 


TTCATTTTTG 


ACGTTAAACA 


AAAAATCGTT 


TCTTATTTGG 


3180 


AT!FGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTAGGCTC 


TGGAAAGACG 


3240 


CTCGTTAGCG 


TTGGTAAGAT 


TCAGGATAAA 


ATTGTA6CTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


CGGATAAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAAIGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGCGGTAC 


TTGGTTTAAT 


3480 


ACCCGTTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATTG 


ATTGGTTTCT 


ACATGCTCGT 


3540 


AAATTAGGAT 


GGGATATTAT 


CTTCCTTGTT 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


3600 


CGTTCTGCAT 


TAGCTGUkACA 


TGTTGTTTAT 


TCTCGTCGTC 


TGGACAGAAT 


TACT7TACCT 


3660 


TTTGTCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 


GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGGCOTTAT 


3780 


ACTGGXAAGA 


ATTTGTATAA 


CGCATATGAT 


ACTAAAGAGG 


CTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


6TCGGTATTT 


CAAACCATTA 


3900 


AATTTAGGTC 


AGAAGATGAA 


GCTTACTAAA 


ATATATTTGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960 


TGTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTAAAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140^ 


AGCGACGATT 


TACAGAAGCA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAGG 


TAATTCAAAT 


GAAATTGTTA 


AATGTAATTA 


ATTTTGTTTT 


CTTGATGTTT 


4260 


GTTTCATCAT 


CTTCTTTTGC 


TCAGGTAATT 


GAAATGAATA 


ATTCGCCTCT 


GCGCGATTTT 


4320 


GTAACTTGGT 


ATTCAAAGCA 


ATCAGGCGAA 


TCCGTTATTG 


TTTCTCCCGA 


TGTAAAAGGT 


4380 
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ACTGTOaCTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATXTCT 4440 

GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 4500 

AATCCAAACA ATCAGGATTA TATTGATGAA TTGCCATCAT CTGATAATCA GGAATAXGAT 4560 

GATAATTCCG CTCCTTCTGG TGGTTTCTTT GTOCCGCAAA ATGATAATGT TACTCUUUICT 4620 

TTTAAAATTA ATAACGTTCG GGCAAAGGAT TTAATACGAG TTGTCGAATT GTTTGTAAAG 4680 

TCTAAIACTT CTAAATCCTC A&ATGTATTA TCTATTGACG GCTCTAATCT ATTAGTTCTT 4740 

AGTGCACCTA AAGATATTOT AGATAACCTT CCTCAATTCC TTTCTACTGT TGATTTCCCA 4800 

ACTGACCA6A TATTGATTGA GGGTTTGATA TTTGAGGTTC AGCAAGGTGA TGCTOTAGAT 4860 

TTTTCATTTG CTGCTGGCTC TCAGCGTGGC ACTGTTGCAG GCGGTGTTAA TACTGACCGC 4920 

CTCACCTCTG TTTTATCTOC TGCTGGTCGT TCGTTCGGTA TTTTTAATGG CGAT6TTTTA 4980 

GGGCTATCAG TTC6CGCATT AAAGACTAAT AGCCAOTCAA AAATATTGTC T6TGCCACGT 5040 

ATTCTTACGC TTTCAGGTCA GAAGGGTTCT ATCTCTGTTG GCCAGAATGT CCCTTTTATT 5100 

ACTGGTOGTG TGACTGGTGA ATCTGCCAAT GTAAATAATC CATTTCAGAC GATTGAGCGT 5160 

CAAAATCTAG 6TATTTCCAT GAGCGTTTTT CCTGTTGCAA TGGCTGGCGG TAATATTGTT 5220 

CTGGATATTA CCAGCAAGGC CGATAGTTTG AGTTCTTCTA CTCAGGCAAG TGATCTTATT 5280 

ACTAATCAAA GAAGTATTGC TACAACGGTT AATTTGCGTG ATGGACAGAC TCTTTTACTC 5340 

GGTGGCCTCA CTGATTATAA AAACACTTCT CAAGATTCTC GCGTACCGTT CCTGTCTAAA 5400 

ATCCCTTTAA TCGGCCTCCT GTTTAGCTCC C6CTCTGATT CCAACGAGGA AAGCACGTTA 5460 

TACGTGCTCG TCAAAGCAAC CATAGTACGC GCCCTGTAGC GGCGCATTAA GCGCGGCGGG 5520 

TGTGGTGGTT ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCCTTT 5580 

CGCTTTCTTC CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG 5640 

GGGGCTCCCT TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA 5700 

TTTGGGTGAT GGTTCACGTA GTGGGCCATC GCCCTGATAG ACGGTTTTTC GCCCTTOXSAC 5760 

GTTGGAGTCC ACGTTCTTTA ATAGTGGACT CTTGTTCCAA ACTGGAACAA CACTCAACCC 5820 

TATCTCGGGC TATTCTTTTG ATTTATAAGG GATTTTCCCG ATTTCGGAAC CACCATCAAA 5880 

CAGGATTTTC GCCTGCTGGG GCAAACCAGC GTGGACCGCT TGCTGCAACT CTCTCAGGGC 5940 

CAGGCGGTGA AGGGCAATCA GCTGTTGCCC GTCTCGCTGG TGAAAAGAAA AACCACCCTG 6000 

GCGCCCAATA CGCAAACCGC CTCTCCCC6C GCGTTGGCCG ATTCATTAAT GCAGCTGGCA 6060 

CGACAGGTTT CCCGACTGGA AAGCGGGCAG TGAGCGCAAC GCAATTAATG TGAGTTAGCT 6120 

CACTCATTAG GCACCCCAGG CTTTACACTT TATGCTTCCG GCTCGTATGT TGTGTGGAAT 6180 

TGTGAGCGGA TAACAATTTC ACACAGGAAA CAGCTATGAC CAGGATGTAC GAATTCGCAG 6240 

GTAGGAGAGC TCGGCGGATC CTAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT 6300 

AGTTTACAGG CAAGTGCTAC TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA 6360 

GTTGGTGCTA CCATAGGGAT TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA 6420 
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GCTGGCGTAA TAGCGAAGAG GCCCGCACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 6480 

ATGGCGAilTG 6CGCTTTGCC TGGTTTCCGG CACCAGAAGC GGTGCCGGAA AGCTGGCTGG 6540 

AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 6600 

ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 6660. 

CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 6720 

AGGAAGGCCA GACGCGAATT ATTTTTGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 6780* 

TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT AAATATTTGC 6840 

TTATACAATC TTCCTGTTTT TGGGGC7TTT CTGATXATCA ACCGGGGTAC ATATGATTGA 6900 

CATGCTAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCCAGAC TCTCAGGCAA 6960 

TGACCTGATA 6CCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAATTTATC 7020 

AGCTAGAACG 6TTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 7080 

TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 

AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 7200 

TGTTTTTGGT ACAACCGACT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 

TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 7294 
(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQ17SMCE CHARACTERISTICS: 

(A) LENGTB: 7320 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEDNESS: both 
(b) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEO ID NO: 2: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCIATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 
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TCTTCCCA&C 


GTCCTGACTG 


GTATAATGAG 


CGAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 




AGTTGAAATT 


AAACCATCTC 


AAGCCCAArTT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTOGICAGGG 


GAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AAT&XCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


T6TACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTGAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGOGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TCTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGHTGAG 


TGTTT7AGTG 


TATTCTTOCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTG6C21TTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTT7CGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GGAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGOGTGGGCG 


ATGGTTGTTG 


XCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


ATTCACCTCG 


AAAGCAA6CT 


GATAAACCGA 


TACAATXAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 


TTTACEAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


T6AGGGTTCT 


1740 


CTGTGGlkATG 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC 


TATCCCXGAA 


AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 


CAAGGCACTG 


ACCCCGTTAA 


AACTTATTAC 


GAGTACACTC 


CTGTATCATC 


AAAAGCCATG 


2160 


TATGACGCTT 


ACTGGAACGG 


TAAAT7CAGA 


GACTGOGCTT 


TCCATTCTGG 


CTTTAATGAA 


2220 


GATCCATTCG 


TTTG7GAATA 


TCAAGGCCAA 


TCGTCTGACC 


TCCCTCAACC 


TCCTGTCAAT 


2280 


GCTGGCGGCG 


GCTCTG6TGG 


TCGTTCTCGT 


GGCGGCTCTG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 


GGCGGTTCTG 


AGGGTGGCGG 


CTCTGAGGGA 


GGCGGTTCCG 


6TGGTGGCTC 


TGGTTCCGGT 


2400 


GATTTTGATT 


ATGAAAAGAT 


GGCAAACGCT 


AATAAGGGGG 


CTATGACCGA 


AAATGCCGAT 


2460 


GAAAACGCGC 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTGATT 


CTGTCGCTAC 


TGATTACGGT. 


2520 


GCXGCTATCG 


ATGGTTTCAT 


TGGTGACGTT 


TCCGGCCTTG 


CTAATGGTAA 


TGGTGCTACT 


2580 


GGTGATTTTG 


CTGGCTCTAA 


TTCCCAAATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATGAATA 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700 


TTTGTCTTTA 


GCGCTGGTAA 


ACCATATGAA 


TTTTCTATTG 


ATTGTGACAA 


AATAAACTTA 


2760 


TTCCGTGGTG 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTATGTATGT 


ATTTTCTACG 


2820 
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TTTGCTAACA 


TACTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 


TTAiUkAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGT6 


GGTTATCTCT 


CTGATATTAG 


CGCTCAATTA 


CCCTCTGACT 


3060, 


TTGTTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


CTAATGCGCT 


TCCCTGTTTT 


TATGTTATTC 


3120 


TCTCTGXAAA 


GGCTGCTATT 


TTCATTTTTG 


ACGTTAAACA 


AAAAATCGTO! 


TCTTATTTGG 


3180- 


ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTAGGCTC 


TGGAAAGACG 


3240 


CTCGTTAGCG 


TTGGTAAGAT 


TTAGGATAIOl 


ATTGTAGCTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTIAGAATAC 


CGGATAAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTCGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGCGGTAC 


TTGGTTTAAT 


3480 


ACCCGTTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATTG 


ATTGGTTTCT 


ACATGCTCGT 


3540 


AAATTAGGAT 


GGGATATTAT 


CTTCCTT6TT 


CAGGACTTAT 


CTATTCTTGA 


TAAACAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 


TTTGTCGGTA 


CTT7ATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATIACAT 


3720 


GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTCAGCG 


7TCGCTTTAT 


3780 


ACTGGTAAGA 


ATTTGXATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


TAATTATCAT 


3840 


TCCGGTCTTT 


ATTCTTATTT 


AACGCCTXAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


3900 


AATTTAGGTC 


AGAAGATCAA 


ATTAACTAAA 


ATATATTTGA 


AAAAGTTTTC 


XCGCGTTCTT 


3960 


TGTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTA2kAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140 


AGCGACGATT 


TACAGAA6CA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AATTTTGTTT 


TCTTGATGTT 


4260 


TGTTTCATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


4320 


TGTAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


ATCC6TTATT 


GTTTCTCCCG 


ATGTAAAAGG 


4360 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CXACGCAATT 


TCTTTATTTC 


4440 


TGTTTTACGT 


GCTAATAATT 


TTGATATGGT 


TGGCTCAATT 


CCTTCCATAA 


X7CAGAAGTA 


4500 


TAATCCAAAC 


AATCAGGATT 


ATATTGATGA 


ATTCCCATCA 


TCTGATAATC 


AGGAATATGA 


4560 


TGATAATTCC 


GCTCCTTCTG 


GTGGTTTCTT 


TGTTCCGCAA 


AATGATAATG 


TTACTCAAAC 


4620^ 


TTTTAAAATT 


AATAACGTTC 


GGGCAAAGGA 


TTTAATACGA 


GTTGTCGAAT 


TGTTTGTAAA 


4680 


GTCTAATACT 


TCTAAATCCT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGTGCACCT 


AAAGATATTT 


TAGATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


AACTGACCAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTGAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


486C 
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CTTTTCATTT 


GCOX^TGGCT 


CTOIGCGTGG 


CACTGTTGCA 


GGCGGTGTTA 


ATACTGACCG 


4920 


CCTCACCTCT 


GTTTTATCTT 


CTGCTGGTGG 


TTCGTTCGGT 


ATTTTXAATG 


GCGATGTTTT 


4980 


AGGGCTATCA 


GTTCGCGCAT 


TAAAGACTAA 


TAGCCATTCA 


AAIkATATTGT 


CTGTGCCACG 


5040 


TATTCTTACG 


CTTTCAGGTC 


AGAAGGGTOC 


TATCTCTGTT 


GGCCAGAATG 


TCCCTTTTAT 


5100 


TACTGGTC6T 


GTGACTGGTG 


AATCTGCCAA 


TGTAAATAAT 


CCATTTCAGA 


CGATTGA6CG 


5160 


TCAAAATGTA 


GGTATTTCCA 


TGAGCGTTTT 


TCCTGTTGCA 


ATGGCTGGCG 


GTAATATTGT 


5220 


TCTGGATATT 


ACCAGCAAGG 


CCGATAGTTT 


GAGTTC7TCT 


ACTCAGGCAA 


GTGATGTTAT 


5280 


TACTAATCAA 


AGAAGTATTG 


CTACAAC6GT 


TAATTTGCGT 


GATGGACAGA 


CTCTTTTACT 


5340 


CGG7GGCCTC 


ACTGATIATA 


AAAACAC7TC 


TCAAGATTCT 


GGCGTACCGT 


TCCTGTCTAA 


5400 


AATCCCTTTA 


ATCGGCCTCC 


TGTTTAGCTC 


CCGCTCTGAT 


TCCAACGAGG 


AAAGCACG7T 


5460 


ATACGT6CTC 


GTCAAAGGAA 


CCAXAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


5520 


GT6TGGTGGT 


TACGCGCAGC 


GTGACCGCTA 


GACTTGCCAG 


CGCCCTAGCG 


CCCGCTCCTT 


5580 


TCGCTTTCTT 


CCCTTCCTTT 


CTCGCCACGT 


TCGCCGGCTT 


TCCCCGTCAA 


GCTCTAAATC 


5640 


GGGGGCTCCC 


TTTAGGGTTC 


CGATTTAGTG 


CTTTACGGCA 


CCTCGACCCC 


AAAAAACTTG 


5700 


ATTTGGGTCA 


TGGTTCACGT 


AGTGGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA 


5760 


CGTTGGAGTC 


CACGTTCTTT 


AATAGTGGAC 


TCTTGTTCCA 


AACTGGAACA 


ACACTCAACC 


5820 


CTATCTCGGG 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA 


5880 


ACAGGATTTT 


CGCCTGCT6G 


GGCAAACCAG 


CGTGGACCGC 


TTGCT6CAAC 


TCTCTCAGGG 


5940 


CCAGGCGGTG 


AAGGGCAATC 


AGCTGTT6CC 


CGTCTCGCTG 


GTGAAAAGAA 


AAACCACCCT 


6000 


GGCGCCCAAT 


ACGCAAACCG 


CCTCTCCCCG 


CGCGTTGGCC 


GATTCA7TAA 


TGCAGCTGGC 


6060 


ACGACAGGTT 


TCCCGACTGG 


AAAGCGGGCA 


GTGAGCGCAA 


CGCAATTAAT 


GTGAGTTAGC 


6120 


TCACTCATTA 


GGCACCCCAG 


GCTTTACACT 


TTATGCTTCC 


GGCTCGTATG 


TTGTGTGGAA 


6180 


TTGTGAGCGG 


ATAACAATTT 


CACACGCCAA 


GGAGACAGTC 


ATAATGAAAT 


ACCTATTGCC 


6240 


TACGGCAGCC 


GCTGGATTGT 


TATTACTCGC 


TGCCCAACCA 


GCCATGGCCG 


AGCTCGTGAT 


6300 


GACCCAGACT 


CCAGAATTCC 


ATCCGGAATG 


AGTGTTAATT 


CTAGAACGCG 


TAAGCTTGGC 


6360 


ACTGGCCGTC 


6TTTTACAAC 


GTCGTGACTG 


GGAAAACCCT 


GGCGTTACCC 


AACTTAATCG 


6420 


CCTTGCAGCA 


CACCCCCCTT 


TCGCCAGCTG 


GCGTAATAGC 


GAAGAGGCCC 


GCACCGATCG 


6480 


CCCTTCCCAA 


CAGTTGCGCA 


GCCTGAATGG 


CGAATGGCGC 


TTTGCCTGGT 


TTCCGGCACC 


6540 


AGAAGCGGTG 


CCGGAAAGCT 


GGCTGGAGTG 


CGATCTTCCT 


GAGGCCGATA 


CGGTCG7CGT 


6600 


CCCCTCAAAC 


TGGCAGATGC 


ACGGTTACGA 


TGCGCCCATC 


TACACCAACG 


TAACCTATCC 


6660 


CATTACGGTC 


AATCCGCCGT 


TTGTTCCCAC 


GGAGAATCCG 


ACGGGTTGTT 


ACTCGCTCAC 


6720 


ATTTAATGTT 


GATGAAAGCT 


GGCTACAGGA 


AGGCCAGACG 


CGAATTATTT 


TTGATGGCGT 


6780 


TCCTATTGGT 


TAAAAAATGA 


GCTGATTTAA 


CAAAAATTTA 


ACGCGAATTT 


TAACAAAATA 


6840 


TTAACGTTTA 


GAATTTAAAT 


ATTTGCTTAT 


ACAATCTTCC 


TGTTTTOGGG 


GCTTTTCTGA 


6900 
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TTATCAACC6 


GGGTACATAT 


GATTGACATG 


CTAGTTTTAC 


GATTACCGTT 


CATCGATTCT 


6960 


CTTGTTTGCT 


CCAGACTCTC 


AGGCAATGAC 


CTGATAGCCT 


TTGTAGATCT 


CTCAAAAATA 


7020 


GCTACCCTCT 


CCGGCATTAA 


TTTATCAGCT 


AGAACGGTTG 


AATATCATAT 


TGATGGTGAT 


7080 


TTGACT6TCT 


CCGGCCTTTC 


TCACCCTTTT 


GAATCTTTAC 


CTACACATTA 


CTCAGGCATT 


7140 


GCATTTAAAA 


TATATGAGGG 


TTCTAAAAAT 


TTTTAT C CTT 


GCGTTGAAAT 


AAAGGCTTCT 


7200 


CCCGCAAAAG 


TATTACAGGG 


TCATAATGTT 


TTTGGTACAA 


CCGATTTAGC 


TTTATGCTCT 


7260 


GAGGCTTTAT 


TGCTTAATTT 


TGCTAATTCT 


TTGCCTTGCC 


TGTATGATTT 


ATTGGACGTT 


7320 


(2) INFORMATION FOR S£Q ID N0$3: 










(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 








(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 








AATGCTACTA 


CTATTA6TAG 


AATTGATGCC 


ACCTTTTCA6 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGPTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGA6 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTC6TCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTT6 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 
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CAAAG21TGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


138C 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


6CGACCGAAT 


ATATCGGTTA 


1440 


T6CGTGGGCG 


ATGG7TGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


ATTC&CCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 


TTTACXAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTCTGGAATG 


CTACAGGCGT 


T6TAGTTTGT 


ACT6GTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC 


TATCCCTGAA 


AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CA6AATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 


CAAGGGACTG 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


CTGTATGATC 


AAAAGCGATG 


2160 


TATGACGCTT 


ACTGGAACGG 


TAAATTGAGA 


GACTGCGCTT 


TCCATTCTGG 


CTTTAATGAA 


2220 


GATCCATTCG 


TTTGTGAATA 


TCAAGGCCAA 


TCGTCTGACC 


TGCCTCAACC 


TCCTGTCAAT 


2280 


GCTGGCGGCG 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGCTCTG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 


GGCGGTTCTG 


AGGGTGGCGG 


CTCTGAGGGA 


GGCGGTTCCG 


GTGGTGGCTC 


TGGTTCCGGT 


2400 


GATTTTGATT 


ATGAAAAGAT 


GGCAAACGCT 


AATAAGGGGG 


CTAIGACCGA 


AAATGCCGAT 


2460 


GAAAACGCGC 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTCATT 


CTGTCGCTAC 


TGATTACGGT 


2520 


GCTGCTATCG 


ATGGTTTCAT 


TGGTGACGTT 


TCCGGCCTTG 


CTAATGGTAA 


TGGTGCTACT 


2580 


GGTGATTTTG 


CTGGCTCTAA 


TTCCCAAATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATGAATA 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700 


TTTGTCTTTA 


GCGCT6GTAA 


ACCATATGAA 


TTTTCTATTG 


ATTGXGACAA 


AATAAACTTA 


2760 


TTCCGTGGTG 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTATGTATGT 


ATTTTCXACG 


2820 


TTTGCXAACA 


TACTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATT6CG 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 


TTAAA^UIGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


CGCTCAATTA 


CCCTCTGACT 


3060 


TTGTTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


CTAAT6CGCT 


TCCCTGTTTT 


TATGTTATTC 


3120 


TCTCTGTAAA 


GGCTGCTATT 


TTCATTTTTG 


ACGTTAAACA 


AAAAATCGTT 


TCTTATTTGG 


3180 


ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTA6GCTC 


TGGAAAGACG 


3240 
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CTCGTTAGCG 


TTGGTAAGAT 


TCAGGATAAA 


ATTGTAGCTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


CGGATAAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGCGGTAC 


TTGGTTTAAT 


3480 


ACCCGTTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATTG 


ATTGGTTTCT 


ACATGCTCGT 


3540 


AAATTAGGAT 


GGGATATTAT 


TTTTCTTGTT 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


360a 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 


TTTGTCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 


GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGGCTTTAT 


3780 


ACTGGTAAGA 


ATTTGTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


3900 


AATTTAGGTC 


AGAAGATGAA 


GCTTACTAAA 


ATATATTTGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960 


TCTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTAAAA 


AGGTA6TCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140 


AGCGACGATT 


TACAGAAGCA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AATTTTGTTT 


TCTTGATGTT 


4260 


TGTTTCATCA 


TCTXCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


4320 


TGTAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


AXCCGTTATT 


GTT7CTCCCG 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGCAATT 


TCTTTATTTC 


4440 


TGTTTTACGT 


GCTAATAATT 


TTGATATGGT 


TGGTTCAATT 


CCTTCCATAA 


TTCAGAAGTA 


4500 


TAATCCAAAC 


AATCAGGATT 


ATATTGATGA 


ATTGCCATCA 


TCTGATAATC 


AGGAATATGA 


4560 


TGATAATTCC 


GCTCCTTCTG 


GTGGTTTCTT 


TGTTCCGCAA 


AATGATAATG 


TXACTCAAAC 


4620 


TTTTAAAATT 


AATAACGTTC 


GGGCAAAGGA 


TT7AATACGA 


GTTGTCGAAT 


TGTTTGTAAA 


4680 


GTCTAATACT 


TCTAAATCCT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGTGGACCT 


AAAGATATTT 


TAGATAACC7 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


AACTGACCAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTGAGG7T 


GAGGAAGGTG 


ATGCTTTAGA 


4860 


TTTTTCATTT 


GCTGCTGGCT 


CTCAGCGTGG 


CACTGTTGCA 


GGCGGTGTTA 


ATACTGACCG 


4920 


CCTCACCTCT 


GTTTTATCTT 


CTGCTGGTGG 


TTCGTTCGGT 


ATOTTTAATG 


GCGATGTTTT 


4980^ 


AGGGCTATCA 


GTTCGCGCAT 


TAAAGACTAA 


TAGCCATTCA 


AAAATATTGT 


CTGTGCCACG 


5040 


TATTCTTACG 


CTTTCAGGTC 


AGAAGGGTTC 


TATCTCTGTT 


GGCCAGAATG 


TCCCTTTTAT 


5100 


TACTGGTCGT 


GTGACTGGTG 


AATCTGCCAA 


TGTAAATAAT 


CCATTTCAGA 


CGATTGAGCG 


5160 


TCAAAATGTA 


GGTATTTCCA 


TGAGCGTTTT 


TCCTGTTGCA 


ATGGCTGGCG 


GTAATATTGT 


5220 


TCTGGATATT 


ACCAGCAAGG 


CCGATAGTTT 


GAGTTCTTCT 


ACTCAGGCAA 


GTGATGTTAT 


5280 
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TACTAATCAA 


AGAAGTATTG 


CTACAACGGT 


TAATTTGCGT 


GATGGACAGA 


CTCTTTTACT 


5340 


CGGTGGCCTC 


ACTGATTATA 


AAAACACTTC 


TCAAGATTCT 


GGCGTACCGT 


TCCTGTCTAA 


5400 


AATCCCTTTA 


ATCGGCCTCC 


TGTTXAGCTC 


CCGCTCTGAT 


TCCAACGAGG 


AAAGCACGTT 


5460 


ATACGTGCTC 


GTCAAAGCAA 


CCATAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


5520 


GTGTGGTGGT 


TACGCGCAGC 


GTGACCGCTA 


CACTTGCCAG 


CGCCCTAGCG 


CCCGCTCCTT 


5580 


TCGCTTTCTT 


CCCTTCCTTT 


CTCGCCACGT 


TCGCCGGCTT 


TCCCCGTCAA 


GCTCTAAATC 


5640 


GGGGGCTCCC 


TTTAGGGTTC 


CGATTTAGTC 


CTTTACGGCA 


CCTCGACCCC 


AAAAAACTTG 


5700 


ATTTGGGTGA 


TGGTTCACGT 


AGTGGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA 


5760 


CGTTGGAGTC 


CACGTTCTTT 


AATAGTGGAC 


TCTTGTTCCA 


AACTGGAACA 


ACACTCAACC 


5820 


CTATCTCGGG 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA 


5880 


ACAGGATTTT 


CGCCTGCTGG 


GGCA2UICCAG 


CGTGGACCGC 


TTGCTGCAAC 


TCTCTCAGGG 


5940 


CCAGGCGGTG 


AAGGGCAATC 


AGCTGTTGCC 


CGTCTCGCTG 


GTGAAAAGAA 


AAACCACCCT 


6000 


GGCGCCCAAT 


ACGCAAACCG 


CCTCTCCCCG 


CGCGTTGGCC 


GATTCATTAA 


TGCAGCTGGC 


6060 


ACGACAGGTT 


TCCCGACTGG 


AAAGCGGGCA 


GTGAGCGCAA 


CGCAATTAAT 


GTGAGTTAGC 


6120 


TCACTCATTA 


GGCACCCCAG 


GCTTTACACT 


MATGCTTCC 


GGCTCGTATG 


TTGTGTGGAA 


6180 


TTGTGAGCGG 


ATAACAATTT 


CACACGCGTC 


ACTTGGCACT 


GGCCGTCGTT 


TTACAACGTC 


6240 


GTGACTGGGA 


AAACCCTGGC 


GTTACCCAAG 


CTTTGTAGAT 


GGAGAAAATA 


AAGTGAAACA 


6300 


AAGCACTATT 


GCACTGGCAC 


TCTTACCGTT 


ACCGTTACTG 


TTTACCCCTG 


■TGACAAAAGC 


6360 


CGCCCAGGTC 


CAGCTCCTCG 


AGTCAGGCCT 


ATT6TGCCCA 


GGGGATTGTA 


CTAGTGGATC 


6420 


CTAGGCTGAA 


GGCGATGACC 


CTGCTAAGGC 


TGCATTCAAT 


AGTTTACAGG 


CAAGTGCTAC 


6480 


TGAGTACATT 


GGCTACGCTT 


GGGCTATGGT 


AGTAGTTATA 


GTTGGTGCTA 


CCATAGGGAT 


6540 


TAAATTATTC 


AAAAAGTTTA 


CGAGCAAGGC 


TTCTTAAGCA 


ATAGCGAAGA 


GGCCCGCACC 


6600 


GATCGCCCTT 


CCCAACAGTT 


GCGCAGCCTG 


AATGGCGAAT 


GGCGCTTTGC 


CTGGTTTCCG 


6660 


GCACCAGAAG 


CGGTGCCGGA 


AA6CTGGCTG 


GAGTGCGATC 


TTCCTGAGGC 


CGAXACGGTC 


6720 


GTCGTCCCCT 


CAAACTCGCA 


GATGCACGGT 


TACGATGCGC 


CCATCTACAC 


CAACGTAACC 


6780 


TATCCCATTA 


CGGTCAATCC 


GCCGTTTOTT 


CCCACGGAGA 


ATCCGACGGG 


TTGTTACTCG 


6840 


CTCACATTTA 


AT6TTGATGA 


AAGCTGGCTA 


CAGGAAGGCC 


AGACGCGAAT 


TATTTTTGAT 


6900 


GGC6TTCCTA 


TTGGTTAAAA 


AATGAGCTGA 


TTTAACAAAA 


ATTTAACGCG 


AATTTTAACA 


6960 


AAATATTAAC 


GTTTACAATT 


TAAATATTTG 


CTTATACAAT 


CTTCCTGTTT 


TTGGGGCTTT 


7020 


TCTGATTATC 


AACCGGGGTA 


CATATGATTG 


ACATGCTAGT 


TTTACGATTA 


CCGTTCATCG 


7080 


ATTCTCTTGT 


TT6CTCCAGA 


CTCTCAGGCA 


ATGACCTGAT 


AGCCTTTGTA 


GATCTCTCAA 


7140 


AAATAGCTAC 


CCTCTCCGGC 


ATTAATTTAT 


CAGCTAGAAC 


GGTTGAATAT 


CATATTGATG 


7200 


GTGATTTGAC 


TGTCTCCGGC 


CTTTCTCACC 


CTTTTGAATC 


TTTACCTACA 


CATTACTCAG 


7260 


GCATTGCATT 


TAAAATATAX 


GAGGGTTCTA 


AAAATTTTTA 


TCCTTGCGTT 


GAAATAAAGG 


7320 
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CTTCTCCCGC AAAAGTATTA CAGGGTCATA ATGTTTTTGG TACAACCGAT TTAGCTTTAT 7380 
GCTCTGAGGC TTTATTGCTT AATTTTGCTA AOTCTTTGCC TTGCCTCTAT GATTTATTGG 7440 
ACGOT 7445 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7409 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCC6CAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


6TTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA AAATCGCATA AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCT6GTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACAT6 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTG6GGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGA6GGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 
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ATTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAA2kAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 


TTTACTAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTGTGGAATG 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGQTTCCTA 


TTGGGCTTGC 


TATCCCTGAA 


AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


6GTTCCGAAA 


TAGGCAGGGG 


6CATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 


CA&GGCACTG 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


CTGTATCATC 


AAAAGCCATG 


2160 


TATGACGCTT 


ACTGGAACGG 


7AAATTCAGA 


GACTGCGC7T 


TCCA7TCTGG 


CTTTAATGAA 


2220 


GA.TCCATTCG 


TTTGTGAATA 


TCAAGGCCAA 


TCGTCTGACC 


TGCCTCAACC 


TCCTGTCAAT 


2280 


GCTGGCGGCG 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGCTCTG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 


GGCGGTTCTG 


AGGGTGGCGG 


CTCTGAGGGA 


GGCGGTTCCG 


GTGGTGGCTC 


TGGTTCCGGT 


2400 


GATTTTGATT 


ATGAAAAGAT 


GGCAAACGCT 


AATAAGGGGG 


CTATGACCGA 


AAATGCCGAT 


2460 


GAAAACGCGC 


TAGAGTCTGA 


CGCTAAAGGC 


AAAC7TGATT 


CTGTCGCTAC 


TGATTACGGT 


2520 


GCTGCTATCG 


ATGGTTTCAT 


TG6TGAC6TT 


TCCGGCCTTG 


CTAATGGTAA 


TGGTGCTACT 


2580 


GGTGATTTTG 


CTGGCTCTAA 


TTCCCAAATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATCAATA 


ATTTCCGTGA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700 


TTTGTCTTTA 


GCGCTGGTAA 


ACCATATGAA 


TTTTCTATTG 


ATTGTGACAA 


AATAAACTTA 


2760 


TTCCGTGGTG 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTATGTATGT 


ATTTTCTACG 


2820 


TTTGCTAACA 


TACTGCGTAA 


TAAGGA6TCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 


TTTCCTCGGT 


7TCCTTCTGG 


TAACTT76TT 


CGGCTATCTC 


CTTACTTTTC 


2940 


TTAAAAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


CGCTGAATTA 


CCCTCTGACT 


3060 


TTGTTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


C7AATGCGCT 


TCCCTGTTTT 


TATGTTATTC 


3120 


TCTCTGTAAA 


GGCTGCTA7T 


TTCAtTTTTTG 


AC6TTAA&CA 


AAIUUITCGTT 


TCTTATTTGG 


3180 


ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


i^ATTAGGCTC 


TGGAAAGACG 


3240 


CTCGTTAGCG 


TTGGTAAGAT 


TCAGGATAAA 


ATTGTAGCTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCrrCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


CGGA7AAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAA7GAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGCGGTAC 


TTGGTT7AAT 


3480 


ACCCGTTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATT6 


ATTGGTTTCT 


ACATGCTCGT 


3540 
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AAATTAGGAT 


GGGATATTAT 


TTTTCTTGTT 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 


TTT6TCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 


GTTGGCGTTG 


TTAAATATGG 


CGATTCICAA 


TSAAGCCCTA 


CTGTTGAGCG 


TTGGCOTTAT 


3780 , 


ACTGGTAAGA 


ATTTGTATAA 


C6CATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTA7TT 


CAAACCATTA 


3900- 


AATTTAGGTC 


AGAAGATGAA 


GCTTAGTAAA 


ATATATTTGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960 


TGTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAA6CCG 


4020 


GAGGTTAAAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATO 


CTAAGGGAAA 


ATTAATTAAT 


4140 


A6CGACGA7T 


TACAGAAGCA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAAG 


GTAATTCAAA 


XGAAATTGTT 


AAATGTAATT 


AATTTTCTTT 


TCTTGATGTO 


4260 


TGTTTCATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGC6ATTT 


4320 


TGTAAC7TGG 


TATTCAAAGC 


AATCAGGCGA 


ATCCGTTATT 


GTTTCTCCCG 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGGAATT 


TCTTTATTTC 


4440 


TGT7TTACGT 


GCTAAllAATT 


TTGATATGGT 


a?GGTTCAATT 


CCTTCGATAA 


TTCAGAAGTA 


4500 


TAATCCAAAC 


AATCAGGATT 


ATATTGATGA 


ATTGCCATCA 


TCTGATAATC 


AGGAATATGA 


4560 


TGATAATTCC 


GCTCCTTCTG 


6TGGTTTCTT 


TGTTCCGCAA 


AATGATAATG 


TTACTCAAAC 


4620 


TTTTAAAATT 


AATAACGTTC 


GGGCAAAGGA 


TTTAATACGA 


G7TCTCGAAT 


TGTTTGTAAA 


4680 


GTCTAATACT 


TCTAAATCCT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGTGCACCT 


AAAGATATTT 


TA6ATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


AACTGACCAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTCAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


4860 


TTTTTCATTT 


GCTGCTGGCT 


CTCAGCGTGG 


CACTGTTGCA 


GGCGGTOTTA 


ATACTGACCG 


4920 


CCTCACCTCT 


GTTTTATCTT 


CT6CTG6TGG 


TTCGTTCGGT 


ATTTTTAAT6 


GCGATGTTTT 


4980 


AGGGCTATCA 


GTTCGCGCAT 


TAAAGACTAA 


TAGCCATTCA 


AAAATATTGT 


CTGTGCCACG 


5040 


TATTCTTACG 


CTTTCAGGTC 


AGAAGGGTTC 


TATCTCTGTT 


GGCCAGAATG 


TCCCTTTTAT 


5100 


TACTGGTCGT 


GTGACTGGTG 


AATCTGCCAA 


TGTAAATAAT 


CCATTTCAGA 


CGATTGAGCG 


5160 


rrCAAAATGTA 


GGTATTTCCA 


TGAGCGTTTT 


TCCTGTTGCA 


ATGGCTGGCG 


GTAATATTGT 


5220 


TCTGGATATT 


ACCAGGAAGG 


CCGATAGTTT 


GAGTTCTTCT 


ACTCAGGCAA 


GTGATGTTAT 


5280 


TACTAATCAA 


AGAAGTATTG 


CTACAACGGO! 


TAATTTGCGT 


GATGGACAGA 


CTCTTTTACT 


5340 ^ 


CGGTGGCCTC 


ACTGATTATA 


AAAACACTTC 


TCAAGAT7CT 


GGCGTACCGT 


TCCTGTCTAA 


5400 


AATCCCTTTA 


ATCGGCCTCC 


TGTTTAGCTC 


CCGCTCTGAT 


TCCAACGAGG 


AAAGCACGTT 


5460 


ATACGTGCTC 


GTCAAAGCAA 


CCATAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


5520 


GTGTGGTGGT 


TACGCGCAGC 


GTGACCGCTA 


CACTTGCCAG 


CGCCCTAGCG 


CCCGCTCCTT 


5580 
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TCGCTTTCTT 


CCCTTCCTTT 


CTCGCCACGT 


TCGCCGGCTT 


TCCCCGICAA GCTCTAAATC 


5640 


GGGGGCTCCC 


TTTAGGGTTC 


CGATTTAGTG 


CTTTACGGCA CCTCGACCCC AAAAAACTTG 


5700 


ATTTGGGTGA 


TGGTTCACGT 


AGTGGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA 


5760 


CGTTGGA6TC 


CACGTTCTTT 


AATAGTGGAC 


TCTTGTTCCA AACTCGAACA ACACTCAACC 


5820 


CTATCTCGGG 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA 


5880 


ACAGGATTTT 


CGCCTCCTGG 


GGCAAACCAG 


CGTGGACCGC 


TTGCTGCAAC 


TCTCTCAGGG 


5940 


CCAGGCGGTG 


AAGGGCAATC 


AGCTGTTGCC 


CGTCTCGCTG 


GTGAAAAGAA 


AAACCACCCT 


6000 


GGCGCCCAAT 


ACGCAAACCG 


CCTCTCCCCG 


CGCGTTGGCC 


GATTCATTAA 


TGCAGCTGGC 


6060 


ACGACAGGTT 


TCCCGACTGG 


AAA6CGGGCA 


GTGAGCGCAA 


CGCAATTAAT 


GTGAGTTAGC 


6120 


TCACTCATTA 


GGCACCCCAG 


GCTTTACACT 


TTATGCTTCC 


6GCTCGTATG 


TTGTGTGGAA 


6180 


TTGTGAGCGG 


ATAACAATTT 


CACACGCGTC 


ACTTGGCACT 


GGCCGTCGT7 


TTACAACGTC 


6240 


GTGACTGGGA AAACCC7GGC 


GTTACCCAAG 


CTTTGTACAT 


GGAGAAAATA AAGTGAAACA 


6300 


AAGCACTATT 


GCACTGGCAC 


TCTTACC6TT 


ACTGTTTACC 


CCTGTGGCAA AAGCCTATGG 


6360 


GGGGTTTATG 


ACTTCTGAGG 


GATCCGGAGC 


TGAA6GCGAT 


GACCCTGCTA AGGCTGCATT 


6420 


CAATAGTTTA 


CAGGCAAGTG 


CTACTGAGTA 


CATTGGCTAC 


GCTTGGGCTA 


TGGTAGTAGT 


6480 


TATAGTTGGT 


GCTACCATAG 


GGATZAAATT 


ATTCAAAAAG 


TTTACGAGCA 


AGGCTTCTTA 


6540 


AGCAATAGCG 


AAGAGGCCCG 


CACCGATCGC 


CCTTCCCAAC 


AGTTGCGCAG 


CCTGAATGGC 


6600 


GAATGGCGCT 


TTGCCTGGTT 


TCCGGCACCA 


GAAGCGGTGC 


CGGAAftGCTG 


GCTGGAGTGC 


6660 


GATCTTCCTG 


AGGCCGATAC 


6GTCGTCGTC 


CCCTCAAACT 


GGCAGATGCA 


CGGTTACGAT 


6720 


GCGCCCATCT 


ACACCAACGT 


AACC7ATCCC 


ATTACGGTCA ATCCGCCGTT 


TGTTCCCACG 


6780 


GAGAATCCGA 


CGGGTTGTTA 


CTCGCTCACA 


TTTAATGTTG 


ATGAAAGCTG 


GCTACAGGAA 


6840 


GGCCAGACGC 


GAATTATTTT 


TGATGGCGTT 


CCTATTGGTT AlkAAAATGAG 


CTGATTTAAC 


6900 


AAAAATTTAA 


CGCGAATTTT 


AACAAAATAT 


TAACGTTTAC 


AATTTAAATA 


TTTGCTTATA 


6960 


CAATCTTCCT 


GTTTTTGGGG 


CTTTTCTGAT 


TATCAACCGG 


GGTACAXATG 


ATTGACATGC 


7020 


TAGTTTTACG 


ATTACCGTTC 


ATCGATTCTC 


TTGTTTGCTC 


CAGACTCTCA 


GGCAATGACC 


7080 


TGATAGCCTT 


TGTAGATCTC 


TCAAAAATAG 


CTACCCTCTC 


CGGCATTAAT 


TTATCAGCTA 


7140 


GAACGGTTGA 


ATATCATATT 


GAT6GTGATT 


TGACTGTCTC 


CGGCCTTTCT 


CACCCTTTTG 


7200 


AATCTTTACC 


TACACATTAC 


TCAGGCATTG 


CATTTAAAAT 


ATATGAGGGT 


TCTA2UUUITT 


7260 


TTTATCCTTG 


CGTTGAAATA AAGGCTTCTC 


CCGCAAAAGT 


ATTACAGGGT 


CATAATGTTT 


7320 


TTGGTACAAC 


CGATTTAGCT 


TTATGCTCTG 


AGGCTTTATT 


GCTTAATTTT 


GCTAATTCTT 


7380 


TGCCTTGCCT 


GTATGATTTA 


TTGGACGTT 








7409 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucl ic acid 

(C) STRANDEDKESS: both 

(D) TOPOIiOGY: circular 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 




TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 






TCAAAAGGAG 


CAATTAAAGG 


fPACTCTCTAA 


TCCTGACCTG 


300 




w X X www X W X 




6AAGCPCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


X XW\»XA»XX4Wi 


Xwx XXX XwnX 


wWJUiX WWVw X 


X XAOwx xwxwn 




420 




AWWXAAAX XXX 


XUtAX X X AXAaU 


X WAX X W X VvVI X 


X X xwxuunnw A 


CFTTAAAGCA 


480 


TTTGA8GGG6 


ATTCAATGAA 

X \«was»X Was* 


X**^ X x*»x 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


6GTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCG6T 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCC CTT 


ATGATTGACC 


1080 


6TCTGC6CCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


6CAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440, 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


ATTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAA6T 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1660 



wo 94/1 1496 



PCr/US93/10850 



95 



TTTACTAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTGTGGAATG 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC 


TATCCCTGAA 


AATGAGGGTG 


OTGGCXCTGA 


GGGTGGCGGT 


1860 


TCTGA6GGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACO^AAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 


CAAGGCACTG 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


CTGTATCATC 


AAAAGCCATG 


2160 


TATGACGCTT 


ACTGGAACGG 


TAAATTCAGA 


GACTGCGCTT 


TCCATTCTGG 


CTTTAATGAA 


2220 


GATCCATTCG 


TTTGTGAATA 


TCAAGGCCAA 


TCGTCTGACC 


TGCCTCAACC 


TCCTGTCAAT 


2280 


GCT6GCGGCG 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGCXCTG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 


GGCGGTTCTG 


AGGGTGGCGG 


CTCTGAGGGA 


GGC6G7TCCG 


GTGGTGGCTC 


TGGTTCCGGT 


2400 


GATTTTGATT 


ATGAAAAGAT 


GGCAAACGCT 


AAXAAGGGGG 


CTATGACCGA 


AAAT6CCGAT 


2460 


GAAAACGCGC 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTGATT 


CTGTCGCTAC 


TGATTACGGT 


2520 


GCTGCTATCG 


ATGGTTTCAT 


TGGTGACGTT 


TCCGGCCTTG 


CTAATGGTAA 


TGGTGCTACT 


2580 


GGTGATTTTG 


CTGGCTCTAA 


T7CCCAAATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATGAATA 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700 


TTTCTCTTTA 


GC6CTGGTAA 


ACCATATGAA 


TTTTCTATTG 


ATT6TGACAA 


AATAAAC7TA 


2760 


TTCCGTGGTG 


TCTTTOCGTT 


TCTTTTATAT 


GTOGCCACCT 


TTATGTATGT 


ATTTTCTACG 


2820 


TTTGCTAACA 


TACT6CGTAA 


TAAGGAGTCT 


TAATGATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 


TTAAAAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


TCCTCAATTA 


CCCTCTGACT 


3060 


TTGTTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


CTAATGCGCT 


TCCCTGTTTT 


TATGTTATTC 


3120 


TCTCTGTAAA 


GGCTGCTATT 


TTCATTTTTG 


ACGTTAAACA 


AAAAATCGTT 


TCTTATTTGG 


3180 


ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTAGGCTC 


TGGAAAGACG 


3240 


CTCGTTAGCG 


TTGGTAAGAT 


TCAGGATAAA 


ATTGTAGCTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


CGGATAAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCSATG 


AG7GCGGTAC 


TTGGTTTAAT 


3480 


ACCC6TTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATTG 


ATTGGTTTCT 


ACATGCTCGT 


3540 


AAATTAGGAT 


GGGATATTAT 


CTTCCTTGTT 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 


TTTGTCG6TA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 
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GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGGCTTTAT 


3780 


ACTGGTikAGA 


A7TTGTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AAC6CCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCA1!TA 


3900 


AATTTAGGTC 


AGAAGATGAA 


GCTTACTAAA 


ATATATTTGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960^ 


TGTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTAAAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080. 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140 


AGCGACGATT 


TACAGAAGCA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAGG 


TAATTCAAAT 


GAAATTGTTA 


AATGTAATTA 


ATTTTGTTTT 


CTTGATGTTT 


4260 


GTTTCATCAT 


CTTC2!TTTGC 


TCAGGTAATT 


GAAATGAATA 


ATTCGCCTCT 


GCGCGATTTT 


4320 


GTAACTTGGT 


ATTCAAAGCA 


ATGAGGCGAA 


TCCGTTATTG 


TTTCTCCCGA 


TGTAAAAGGT 


4380 


ACTGTTACTG 


TATATTCATC 


TGACGTTAA& 


CCTGAAAATC 


TACGCAATTT 


CTTTATTTCT 


4440 


GTTTTACGTG 


CTAATAATTT 


TGATATGGTT 


GGTTCAATTC 


CTTCCATTAT 


TTAGAAGTAT 


4500 


AATCCAAACA 


ATCAGGATTA 


TATTGATGAA 


TTGCCATCAT 


CTGATAATCA 


GGAATATGAT 


4560 


GATAATTCCG 


CTCCTTCTGG 


TGGTOTCTTT 


GTTCCGCAAA 


ATGATAATGT 


TACTCAAACT 


4620 


TTTAAAATTA 


ATAACGTTCG 


GGCAAAGGAT 


TTAATACGAG 


TTGTCGAATT 


GTTTGTAAAG 


4680 


TCTAATACTT 


CTAAATCCTC 


AAATGTATTA 


TCTATTGACG 


GCTCTAATCT 


ATTAGTTGTT 


4740 


AGTGCACCTA 


AAGATATTTT 


AGATAACCTT 


CCTCAATTCC 


TTTCTACTGT 


TGATTTGCCA 


4800 


ACTGACCAGA 


TATTGATTGA 


GGGTTTGATA 


TTTGAGGTTC 


AGCAAGGTGA 


TGCTTTAGAT 


4860 


TTTTCATTTG 


CTGCTGGCTC 


TCAGCGTGGC 


ACTGTTGCAG 


GCGGTGTTAA 


TACTGACCGC 


4920 


CTCACCTCTG 


TTTTATCTTC 


TCCTGGTGGT 


TCGTTCGGTA 


TTTTTAATGG 


CGATGTTTTA 


4980 


GGGCTATCAG 


TTCGCGCATT 


AAAGACTAAT 


AGCCATTCAA 


AAATATTGTC 


TGTGCCACGT 


5040 


ATTCTTACGC 


TTTCAGGTCA 


GAAGGGTTCT 


ATCTCTGTTC 


GCCAGAATGT 


CCCTTTTATT 


5100 


ACTGGTCGTG 


TGACTGGTGA 


ATCTGCCAAT 


GTAAATAATC 


CATTTCAGAC 


GATTCAGCGT 


5160 


CAAAATGTAG 


GTATTTCCAT 


GAGCGTTTTT 


CCTGTTGCAA 


TGGCTGGCGG 


TAATATTGTT 


5220 


CTGGATATTA 


CCAGCAAGGC 


CGATAGTTTG 


AGTTCTTCTA 


CTCAGGCAAG 


TGATGTTATT 


5280 


ACTAATCAAA 


GAAGTATTGC 


TACAACGGTT 


AATTTGCGTG 


ATGGACAGAC 


TCTTTTACTC 


5340 


GGTGGCCTCA 


CTGATTATAA 


AAACAC7TCT 


CAAGATTCTG 


GCGTACCGTT 


CCTGTCTAAA 


5400 


ATCCCTTTAA 


TCGGCCTCCT 


GTTTAGCTCC 


CGCTCTGATT 


CCAACGAGGA 


AAGCACGTTA 


5460 


TACGTGCTCG 


TCAAAGCAAC 


CATAGTACGC 


GCCCTGTAGC 


GGCGCATTAA 


GCGCGGCGGG 


5520^ 


TGTGGTGGTT 


ACGCGCAGCG 


TGACCGCTAC 


ACTTGCCAGC 


GCCCTAGCGC 


CCGCTCCTTT 


5580 


CGCTTTCTTC 


CCTTCCTTTC 


TCGCCACGTT 


CGCCGGCTTT 


CCCCGTCAAG 


CTCTAAATCG 


5640 


GGGGCTCCCT 


TTAGGGTTCC 


GATTTAGTGC 


TTTACGGCAC 


CTCGACCCCA 


AAAAACTTGA 


5700 


TTTGGGTGAT 


GGTTCACGTA 


GTGGGCCATC 


GCCCTGATAG 


ACGGTTTTTC 


GCCCTTTGAC 


5760 
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6TTGGA6TCC AC67TCTTTA ATAGTGGACT CTTCTTCCAA ACTGGAikCAA CACTCiU^CCC 5B20 

TATCTCGGGC TATTCTTTT6 ATTTATAAGG GATTTTGCCG ATTTCGGAAC CACCATCAAA 5880 

CAGGATTTTC GCCTGCTGGG GCAAACCAGC GTGGACCGCT TGCTGCAACT CTCTCA6GGC 5940 

GAGGCGGTCA AGGGCAATCA GCTGTTGCCC GTCTCGCTGG TGAAAAGAAA AACCACCCTG 6000 

GCGCCCAATA CGCAAACCGC CTCTCCCCGC GCGTTGGCCG ATTCATXAAT GCAGCTGGCA 6060 

CGACAGGTTT CCCGACTGGA AAGC6GGCAG TGAGCGCAAC GCAATTAATG TGAGOTTAGCT 6120 

CACTCATTAG GCACCCCAGG CTTTACACTT TATGCTTCCG GCTCGTATGT TGTCTGGAAT 6180 

TGTGAGCGGA TAACAATTTC ACACAGGAAA CAGCTATGAC CAGGATGTAC GAATTCGCAG 62 40 

GTAGGAGAGC TCGGCGGATC CGAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT 6300 

AGTTTACAGG CAAGTGCTAC TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA 6360 

GTTGGTGCTA CCATAGGGAT TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA 6420 

GCTGGCGTAA TAGCGAAGAG GCCCGCACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 6480 

ATGGCGAATG GCGCTTTGCC TGGTTTCCGG CACCAGAAGC GGTGCCGGAA AGCTGGCTGG 6540 

AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 6600 

ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 6660 

CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATCAA A6CTGGCTAC 6720 

AGGAAG6CCA GACGCGAATT ATTTTTGATG GCGTTCCTAT TGG7TAAAAA ATGAGCTGAT 6780 

TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATXAACG TTTACAATTT AAATA7TT6C 6840 

TTATACAATC tTTCCTCTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 6900 

CATGCTAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCCAGAC TCTCAGGCAA 6960 

TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAATTTATC 7020 

AGCTAGAACG GTTGAATATC ATA!rTGATGG TCATTTGACT GTCTCCGGCC TTTCTCACCC 7080 

TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 

AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 7200 

TGTTOTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 

TTCTTTGCCT TGCCTCTATG ATTTATTGGA CGTT 7294 
<2) IKFORHATION FOR SEQ ID NO: 6: 

(i) SSQUEHCE CHARACTERISTICS: 

(A) LENGTH: 7394 base pairti 

(B) T7PE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
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CGTTC6CAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAil 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GG7TCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATT7T 


TGATTTATGG 


XCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


48Q 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


A&21CATTTTA 


CTAMACCCC 


CTCTGGCA^Ul 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAAT6ATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGGAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


T6TACACCGT 


TCATCT67CC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCC6GCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGAIIGA 


TACAAATCTC 


CGTTCTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


tGTTTTAGTG 


TATOCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTT6TTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


AXTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTC6CAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 


TTTACTAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTGTGGAATG 


CTAGAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC 


TATCCCTGAA 


AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 


CAAGGCACTG 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


CTGTATCATC 


AAAAGCCATG 


2160 
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TATGACGCTT 


ACTGGAACGG 


TAAATTCAGA 


GACTGCGCTT 


TCCATTCTGG 


CTTTAATGAA 


2220 


GATCCATTCG 


TTTGTGAATA 


TCAZ^CCAA 


TCGTCTGACC 


TGCCTCAACC 


TCCTGTCAAT 


2280 


GCTGGCGGCG 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGC7CXG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 


GGCGGTTCTG 


AGGGTGGCGG 


CTCTGAGGGA 


GGCGGTTCCG 


GTGGTGGCTC 


TGGTTCCGGT 


2400 


GATTTTGATT 


ATGAAAAGAT 


GGCAAACGCT 


AATAAGGGGG 


CXATGACCGA 


AAATGCCGAT 


2460 


GAAAACGCGC 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTGATT 


CTGTCGCTAC 


TGATTACGGT 


2520 


GCTGCTATCG 


ATGGTTTCAT 


TGGTGACGTT 


TCCGGCCTTG 


CTAATGGTAA 


TGGT6CTACT 


2580 


GGTCATTTTG 


CTGGCTCTA21 


TTCCCA&ATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATGAATA 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700 


TTTGTCTTTA 


GCGCTGGTAA 


ACCATATGAA 


TTTTCTATTG 


ATTGTGACAA 


AATAAACTTA 


2760 


TTCCGTGGTG 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTAT6TATGT 


ATT7TCTACG 


2620 


TTTGCTAACA 


TACTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 


TTAAIkAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTC 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


CGCTCAATTA 


CCCTCTGACT 


3060 


TT6TTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


CTAATGCGCT 


TCCCTGTTTT 


TATGTTATTC 


3120 


TCTCTGTAAA 


GGCTGCTATT 


TTGATTTTTG 


ACGTTAAACA 


AAAAATCGTT 


TCTTATTTGG 


3180 


ATTGGGAIAA 


AXAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTAGGCTC 


a!GGAAAGACG 


3240 


CTCGTTAGCG 


TTCGTAAGAT 


TTAGGATAAA 


ATTCTAGCTG 


GGTGCA2UUIT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


C6GATAAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGCGGTAC 


TTGGTTTAAT 


3480 


ACCCGTTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATTG 


ATTGGTTTCT 


ACATGCTCGT 


3540 


AAATTAGGAT 


GGGATATTAT 


TTTTCTTGTT 


CAGGACTTAT 


CTATTCTTGA 


TAAAGAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACA6AAT 


TACTTTACCT 


3660 


TTTCTC6GTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TCCCTCT6CC 


TAAATTACAT 


3720 


6TTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAA6CCCTA 


CTGTTGAGCG 


TTGGCTTTAT 


3780 


ACTGGTAAGA 


ATTTOTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


3900 


AATTTAGGTC 


AGAAGATGAA 


GCTTACTAAA 


ATATATTXGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960 


TGTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


AGATATAG7T 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTAAAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TOACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140 


AGCGACGATT 


TACAGAAGCA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 
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ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AATTTTGTTT 


TCTTGATGTT 


4260 


TGTTTCATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


4320 


T6TAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


ATCCGTTATT 


GTTTCTCCCG 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGCAATT 


TCTTTATTTC 


4440 


TGTTTTACGT 


GCTAATAATT 


TTGATATGGT 


TGGTTCAMT 


CCTTCCATAA 


TTCAGAAGTA 


4500 


TAATCCAAAC 


AATCAGGATT 


ATATTGATGA 


ATTGCCATCA 


TCTGATAATC 


AGGAATATGA 


4560 


TGATAATTCC 


GCTCCTTCTG 


GTGGTTTCTT 


TGTTCC6CAA 


AATGATAATG 


TTACTCAAAC 


4620 


TTTTAAAATT 


AATAACGTTC 


GGGCAAAGGA 


TTTAATACGA 


GTTGTCGAAT 


TGTTTCTAAA 


4680 


GTCTAATACT 


TCTAAATCCT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGT6CACCT 


AAAGATATTT 


TAGATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


AACTGACCAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTGAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


4860 


TTTTTCATTT 


GCTGCTGGCT 


CTCAGCGTGG 


CACTCTTGCA 


GGCGGTGTTA 


ATACTCACCG 


4920 


CCTCACCTCT 


GTTTTATCTT 


CTGCTGGTGG 


TTCGTTCGGT 


ATTTTTAATG 


GCGATGTTTT 


4980 


AGGGCTATCA 


GTTCGCGCAT 


TAAAGACTAA 


TAGCCATTCA 


AAAATATTGT 


CTGTGCCACG 


5040 


TATTCTTACG 


CTT7CAGGTC 


AGAAGGGTTC 


TATCTCTGTT 


GGCCAGAATG 


TCCCTTTTAT 


5100 


TACTGGTCGT 


GTGACTGGTO 


AATCTGCCAA 


TGTAAATAAT 


CCATTTCAGA 


CGATTGAGCG 


5160 


TCAAAATGTA 


GGTATTTCCA 


TGAGCGTTTT 


TCCTGTTGCA 


ATGGCTGGCG 


GTAATATTGT 


5220 


TCTCGATATT 


ACCAGCAAGG 


CCGATAGTTT 


GAGTTCTTCT 


ACTCAGGCAA 


GTGATGTTAT 


5280 


7ACTAATGAA 


AGAAGTATTG 


CTACAACGGT 


TAATTTGCGT 


GATGGACAGA 


CTCTTTTACT 


5340 


CGGTGGCCTC 


ACTGATTATA 


AAAACACTTC 


TCAAGATTCT 


GGCGTACCGT 


TCCTGTCTAA 


5400 


AATCCCTTTA 


ATCGGCCTCC 


TGTTTAGCTC 


CCGCTCTGAT 


TCCAACGAGG 


AAAGCACGTT 


5460 


ATACGTGCTC 


GTCAAAGCAA 


CCATAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


5520 


GTGTGGTGGT 


TAC6CGGAGC 


GTGACCGCTA 


CACTTGCCAG 


CGCCCTAGCG 


CCC6CTCCTT 


5580 


TCGCTTTCTT 


CCCTTCCTTT 


CTCGCCACGT 


TCGCCGGCTT 


TCCCCGTCAA 


GCTCTAAATC 


5640 


GGGGGCTCCC 


TTTAGGGTTC 


CGATTTAGTG 


CTTTACGGCA 


CCTCGACCCC 


AAAAAACTTG 


5700 


ATTTGGGTGA 


TGGTTCACGT 


AGTGGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA 


5760 


CGTTGGAGTC 


CACGTTCTTT 


AATAGTGGAC 


TCTTGTTCCA 


AACTGGAACA 


ACACTCAACC 


5820 


CTATCTCGGG 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA 


5860 


ACAGGATTTT 


CGCCTGCTGG 


GGCAAACCAG 


CGTGGACCGC 


TTGCTGCAAC 


TCTCTCAGGG 


5940 


CCAGGCGGTG 


AA6GGCAATC 


AGCTGTTGCC 


CGTCTCGCTG 


GTGAAAAGAA 


AAACCACCCT 


6000 


GGCGCCCAAT 


ACGCAAACCG 


CCTCTCCCCG 


CGCGTTGGCC 


GATTGATTAA 


TGCAGCTGGC 


6060 


ACGACAGGTT 


TCCCGACTGG 


AAAGCGGGCA 


GTGAGCGCAA 


CGCAATTAAT 


GTGAGTTAGC 


6120 


TCACTCATTA 


GGCACCCCAG 


GCTTTACACT 


TTATGCTTCC 


GGCTCG7ATG 


TTGTGTGGAA 


6180 


TTGTGAGCGG 


ATAACAATTT 


CACACGCGTC 


ACTTGGCACT 


GGCCGTCGTT 


TTACAACGTC 


6240 
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GTGACTGGGA 


AAACCCTGGC 


GTTACCCAAG 


CTTTGTACAT 


GGAGAAAATA AAGTGAAACA 


6300 


AAGCACTATT 


GCACTCGCAC 


TCTTACCGTT 


ACTGTTTACC 


CC7GTGGCAA AAGCCCTTCT 


6360 


GAGGCATCCG 


GGAGCTGAAG 


GCGATGACCC 


TGCTAAGGCT 


GCATTCAATA 


GTTTACAGGC 


6420 


AAGTGCTACT 


GAGTACATXG 


GCTACGCTOXS 


GGCTATGGTA 


GTAGTTATAG 


TTGGTGCTAC 


6480 


CATAGGGATT 


AAATTATTCA 


AAAAGTTTAC 


GAGCAAGGCT 


TCTTAAGCAA 


TAGCGAAGAG 


6540 


GCCCGCACCG 


ATCGCCCTTC 


CCAACAGTTG 


CGGAGCCTGA ATGGCGAATG 


GCGCTTTGCC 


6600 


TGGTTTCCGG 


CACCAGAAGC 


GGTGCCGGAA 


AGCTGGCTGG AGTGCGATCT 


TCCTGAGGCC 


6660 


GATACGGTCG 


TCGTCCCCTC 


AAACTGGCAG 


ATGCACGGTT 


ACGATGCGCC 


CATCTACACC 


6T20 


AACGTAACCT 


ATCCCATTAC 


GGTCAATCCG 


CCGTTTGTTC 


CCACGGAGAA 


TCCGACGGGT 


6780 


TCTTACTCGC 


TCACAT7TAA 


TGTTGATGAA 


AGCTGGCTAC 


AGGAAGGCCA GACGCGAATT 


6840 


ATTTTTGATG 


GCGTTCCTAT 


TGGTTAAAAA 


ATGAGCTGAT 


TTAACAAAAA TTTAACGCGA 


6900 


ATTTTAACAA 


AATATTAACG 


TTTACAATTT 


AAATATTTGC 


TTATACAATC 


TTCCTGTTTT 


6960 


TGGGGCTTTT 


CTGATTATCA 


ACCGGGGTAC 


ATATGATTGA 


CATGCTAGTT 


TTACGATTAC 


7020 


CGTTCATCGA 


TTCTCTTCTT 


TGCTCCAGAC 


TCTCAGGCAA 


TGACCTGATA GCCTTTGTAG 


7080 


ATCTCTCAAA 


AATAGCTACC 


CTCTCCGGCA 


TTAATTTATC 


AGCTAGAACG 


GTTGAATATC 


7140 


ATATTGATGG 


TGATTTGACT 


GTCTCCGGCC 


TTTCTCACCC 


TTTTGAATCT 


TTACCTACAC 


7200 


ATTACTCAGG 


CATTGCATTT 


AAAATATATG 


AGGGTTCTAA AAATTTTTAT 


CCTTGCGTTG 


7260 


AAATAAAGGC 


TTCTCCC6CA 


AAAGTATTAC 


AGGGTCATAA 


TGTTTTTGGT 


ACAACCGATT 


7320 


TAGCTTTATG 


CTCT6A6GCT 


TTATTGCTTA 


ATTTTGCTAA 


TTCTTTGCCT 


T6CCTGTATC 


7380 


ATTTATTGGA 


CGTT 










7394 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQtXCNCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GATCCTAGGC TGAAGGCGAT GACCCTGCTA AGGCTGC 37 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ATTCAATAGT TTACAGGCAA GTGCTACTGA GTACA 



35 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucl ic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(Xi) SEQX7ENCE DESCRIPTION: SEQ ID NO: 9: 
TTGGCTACGC TTGGGCTATG GTAGTAGTTA TAGTT 35 
(2) INFORMATION FOR SEQ ID NO: 10: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGTGCTACCA TAGGGATTAA ATTATTCAAA AAGTT 35 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TACGAGCAAG GCTTCTTA 18 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGCTTAAGAA GCCTTGCTCG TAAACTTTTT GAATAATTT 39 
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(2) INFORM2VTION FOR SEQ ID HO: 13: 

(i) SEQUENCE CHAIUICTERZSTZCS: 
<A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AATCCCTATG GTAGCACCAA CTATAACTAC TACCAT 36 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CBARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGCCCAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 35 
(2) INFORMATION FOR SEQ ID NO: 15: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCTGTAAACT ATTGAATGCA GCCTTAGCAG GGTC 34 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUEtTCE CHARACTERISTICS: 

(A) LENGTH! 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
ATCGCCTTCA GCCTAG 
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(2) INTORMATION FOR SEQ ID NO: 17; 

(i) SEQXTEKCE CHAKACTERXSTXCS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CTCGAATTCG TACATCCTGG TCATAGC 27 
(2) INFORMATION FOR SEQ ID NO: 18 3 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CATTTTTGCA GATGGCTTAG A 21 
(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TAGCATTAAC 6TCCAATA 18 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATATATTTTA GTAAGCTTCA TCTTCT 



26 
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(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 
GACAAAGAAC GCGTGAAAAC TTT 23 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GCGGGCCTCT TCGCTATT6C TTAAGAAGCC TTGCT 35 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
TTCAGCCTAG GATCCGCCGA GCTCTCCTAC CTGCGAATTC GTACATCC 48 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TGGATTATAC TTCTAAATAA TGGA 24 
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(2) IHFORM&TION FOR SEQ XD KO:25: 

(1) SEQUENCE CHARACTERISTICS: 

(A) I£NGTB: 36 bas pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOFOI.OGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 36 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CBARACTERISTICS: 

(A) l£NGTB: 22 base pairs 

(B) TTPEi nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AATTCGCCAA GGAGACAGTC AT 22 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I£NGTB: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATT6TT 39 
(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CBARACTERISTICS: 

(A) IiEN GTB: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT 39 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHAR21CTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRAMDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TCTAGAACGC GTC 

(2) INFORMATION FOR SEQ ID HOs31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
ACGTGACGCG TTCTAGAATT AACACTCATT CCTGT 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG 
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(2) INFORM^ITION FOR S£Q ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCGGCTGCC 39 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GTAGGCAATA GGTATTTCAT TATGACTGTC CTTGGCG 37 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
TGACTGTCTC CTTGGCGTGT GAAATTGTTA 30 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 



36 
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(2) IKFORHATIOK FOR SEQ ID NO: 37: 

<i) SEQUENCE CBARACTERISTICS: 

(A) X£NGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CAATTTTATC CTAAATCTTA CCAAC 25 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATTTTTGCA GATGGCTTAG A 21 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CGAAAGGGGG GTGT6CTGCA A 21 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
TAGCATTAAC GTCCAATA 
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(2) INFORMA.TIOK FOR SEQ ZD NO: 41: 

(i) SEQUENCE CEARACTERISTZCS : 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOIiOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AAACGACGGC CAGTGCCAAG TGACGC6TGT GA&ATTGTTA TCC 43 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) onrPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC GCC 43 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CH2UUICTERISTICS3 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GGCGTTACCC AAGCTTTGTA CATGGAGAAA ATAAAG 36 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
TGAAACAAAG CACTATTGCA CTGGCACTCT TACCGTTACC GT 
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<2) INFORMATION FOR 6£Q ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I^NGTH: 42 base pairs 

(B) TYPE: nucl ic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TACTGTTTAC CCCTGTGACA AAAGCCGCCC AGGTCCAGCT GC 42 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I.ENGTHS 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG 44 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOi47: 
TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
GGCACAATAG GCCTGACTCG AGCAGCTGGA CCAGGGCGGC TT 
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(2) IHFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I£NGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: singl 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TT6TCACAGG GGTAAACAGT AACGGTAACG GTAAGTGTGC CA 42 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 



GTGCAATAGT GCTTTGTTTC ACTTTATTTT CTCCATGTAC AA 42 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

TAACGGTAAG AGTGCCAGTG C 21 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 68 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: replace (25, ***) 

(D) OTHER INFORMATION: /notB^ '"'M represents an equal 
mixture of A and C at this location and at 
locations 28, 31, 34, 37, 40, 43, 46 & 49-" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
AGCTCCCGGA TGCCTCAGAA GATGMNNMNN HNNMNNMNNM NNMNNHNNMN NGGCTTTTGC 60 



CACAGGGG 
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(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTEiaSTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

<A) NAME/KEY: misc difference 
(B) LOCATION: replace ( 17 1 ***■) 

(D) OTHER INFORMATION: /note= ""M represents an equal 
mixture of A and C at this location and at 
locations 20, 22, 26, 29, 32, 35, 38, 41, 44 & 



(Xi) SEQUENCE DESCXaPTION: SEQ ID NO: S3: 
CAGCCTCGGA TCCGCCMNNM NNMNNMNNMN NHNNMNNHNN MNNMNNATGM GAAT 54 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LEINGTH: 27 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



' (Xi) SEQX7ENCE DESCRIPTION: SEQ ID NO: 54: 
GGTAAACAGT AACGGTAAGA GTGCCA6 27 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GGGCTTTTGC CACAGGGGT 19 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
AGG6TCATCG CCTTCAGCTC CGGATCCCTC AGAAGTCATA AACCCCCCAT AGGCTTTTGC 
CAC 



60 
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(2) INFORMATION TOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
TCGCCTTCAG CTCCCGGATG CCTCAGAAGC ATGAACCCCC CATAGGC 47 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
CAATTTTATC CTAAATCTTA CCAAC 25 
(2) INFORMATION FOR SEQ ID NO:59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GCCTTCA6CC TCGGATCCGC C 21 
(2) INFORMATION FOR SEQ ID NO:€0: 

(i) SEQXTENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
CGGATGCCTC AGAAGCCCCN N 
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(2) XKFORMATION FOR SEQ ID N0:€1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) IiENGTB: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(Ki) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CGGATGCCTC AGAAGGGCTT TTGCCACAGG 30 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) liENGTH: 22 asiino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Thr Gin Ser Lys cys ser Thr Asp His Trp I.eu Gly Tyr lie Glu Tyr 
15 10 15 

Phe lie Met Cys Thr Tyr 
20 

(2) INFORMATION FOR SEQ ID NO:63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Cys Asp Asp Gin Tyr Tyr Thr Asp His Glu Gin Gly Lys cys Glu val 
15 10 15 

Ala Leu Tyr Tyr Thr Gly 
20 
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(2) IMFORHATIOH FOR S£Q ID MO: 64: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

Thr Gin ser Lys Cys Ser Thr Asp His Trp Leu Gly Tyr lie Glu Tyr 
15 10 15 

Phe lie Met Cys Thr Tyr Arg Arg 
20 



(2) INFORMATION FOR SEQ ID NO:65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

Thr Cys Leu Arg Glu Glu Phe lie Leu Gin Cys Tyr lie Val Met II 
15 10 15 

Glu Asp Trp Tyr 
20 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

lie cys Glu His His Gin Met Leu Leu Gin Cys ser Leu Val Cys Glu 
1 5 10 15 

Glu cys Met Met 
20 



i 
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(2) ZHFOBMhTlOU FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) ZiENGTH: 20 amin acids 

(B) T7PE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

Lys Cys He He Gly Trp Tyr Thr Leu Thr cys Tyr Met Ser Asp Arg 
15 10 15 

Pro Arg Met Glu 
20 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: aiaino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Ala Cys Thr Gin Asp Met Asn Trp He Thr Cys Pro Met Tyr Cys Glu 
15 10 15 

Val Leu Cya Phe 
20 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) . TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Val cys Phe Tyr Phe Pro Phe Lys Met Met Cys His Met Glu Tyr He 
15 10 15 

Ala Tyr Glu Tyr 
20 
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(2) XNFOBM&TZOK FOR SEQ ID NO:70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(3Ci) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Asp Ala Asn cys Gly His cys Thr Tyr Met cys lie eye Lys lie Met 
15 10 15 

Tyr Tyr lie Ser 
20 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

Trp His Arg His Val ser Ser Pro Met ser cys Trp Trp Tyr Asp Gin 
15 10 15 

cys Ala Val Ala 
20 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

cys Val Gin lie Asp Phe Phe Thr Val Gin Cys Asn lie Ser ser Bis 
15 10 15 

Met Phe Leu Pro 
20 
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I CIAIM: 

1. A composition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides, each of said 
oligonucleotides encoding a soluble peptide having 

5 constrained secondary structure in solution, wherein each 
of said oligonucleotides is operationally linked to 
expression elements, said expressible oligonucleotides 
having a desirable bias of random codon sequences. 

2. The composition of claim 1, wherein said 
oligonucleotides have more than one codon encoding an 
amino acid capable of forming a covalent bond* 

3. The composition of claim 2, wherein said 
amino acid is an amino acid selected from the group 
consisting of cysteine, glutamic acid, lysine, leucine or 
tyrosine. 

4. The composition of claim 2, wherein said 
oligonucleotide is selected from the group consisting of 
TCLREEFILQCYIVMIEDWY, ICEHHQMLLQCSLVCEECMM, 
KCIIGWYTLTCYMSDRPRME, ACTQDMNWITCPMYCEVLCF, 
VCFYFPFKMMCHMEYIAYEY, DANCGHCTYMCICKIMYYIS , 
WHRHVSSPMSCWWYDQCAVA and CVQIDFFTVQCNISSHMFLP 
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5. The composition of claim If wherein said 
cells ar procaryotes. 

6. The composition of claim A, wherein said 
procaryotic cells are E> coli * 

7. The composition of claim 1, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage • 

8. A con^osition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides, each of said 
oligonucleotides encoding a soluble peptide having 

5 constrained secondary structure in solution , wherein each 
of said oligonucleotides is operationally linked to 
expression elements, said expressible oligonucleotides 
having a desirable bias of random codon sequences 
produced from reindom combinations of first and second 
10 oligonucleotide precursor populations, each or either of 
said first and second precursor having a desirable bias 
of random codon sequences. 

9. The composition of claim 8, wherein said 
first or sec nd precursor oligonucl otides are biased. 
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10 • The composition of claim 8, wh rein said 
first and second pr cursor oligonucl otid s ar biased. 

11. The composition of claim 8/ wherein said 
first or second precursor oligonucleotides have more than 
one codon encoding an amino acid capedsle of forming a 
covalent bond. 

12. The composition of claim 8, wherein said 
first and second precursor oligonucleotides have at least 
one codon encoding an amino acid capable of forming a 
covalent bond. 



13. The con^osition of claim 8, wherein said 
oligonucleotide is selected from the group consisting of 
TCLREEFILQCYIVMIEDWY, ICEHHQMLLQCSLVCEECMM, 
KCIIGWYTLTCYMSDRPRME, ACTQDMNWITCPMYCEVLCF , 
VCFYFPFKMMCHMEYIAYEy, DANCGHCTYMCICKIMYYIS , 
WHRHVSSPMSCWWYDQCAVA and CVQIDFFTVQCNISSHMFLP 
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14. The composition of claim 11 or 12^ wherein 
said amino acid is an amino acid sel ct d from the group 
consisting of cysteine, glutamic acid, lysine, leucine or 
tyrosine • 

15. The composition of claim 8, wherein said 
cells are procaryotes. 

16. The composition of claim 15, wherein said 
procaryotic cells are E. coli . 

17. The composition of claim 8, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage • 
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18. A kit for the pr paration of vectors 
useful for th expr ssion of a div rs population of 
random soluble peptides having constrained secondary 
structure in solution, said peptides being generated from 

5 combined first and second precursor oligonucleotides when 
combined having a desirable bias of random codon 
sequences, comprising: two vectors: a first vector having 
a cloning site for said first precursor oligonucleotides 
and a pair of restriction sites for operationally 

10 combining first precursor oligonucleotides with second 
precursor oligonucleotides; and a second vector having a 
cloning site for said second precursor oligonucleotides 
and a pair of restriction sites complementary to those on 
said first vector, one or both vectors containing 

15 expression elements capable of being operationally linked 
to said combined first and second precursor 
oligonucleotides « 

19. The kit of claim 18, wherein said vectors 
are in a filamentous bacteriophage. 

20. The kit of claim 18, wherein said 
filamentous bacteriophage are M13. 

21. The kit of claim 18, wherein said vectors 
are plasmids or phagemids. 
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22. The kit of claim 18, wher in said first or 
second precursor oligonucl otides are biased toward a 
pre-determined sequence. 

23. The kit of claim 18, wherein said first 
and second precursor oligonucleotides are biased toward a 
predetermdned sequence. 

24. The kit of claim 18, wherein said first or 
second precursor oligonucleotides have more than one 
codon encoding an amino acid capable of forming a 
covalent bond. 

25. The kit of claim 18, wherein said first 
and second precursor oligonucleotides have at least one 
codon encoding an amino acid capable of forming a 
covalent bond. 

26. The kit of claim 24 or 25, wherein said 
amino acid is an amino acid selected from the group 
consisting of cysteine, glutamic acid, lysine, leucine or 
tyrosine. 
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27. A cloning syst m for expressing 
oligonucl otides encoding random^ soluble p ptides having 
constrained secondary structure in solution, said 
oligonucleotides being generated from a desirable bias of 

5 random codon sequences, comprising a vector having a pair 
of restriction sites so as to allow the operational 
combination of said oligonucleotides into a contiguous 
oligonucleotide encoding said soluble peptide having 
constrained secondary structure in solution* 

28. The cloning system of claim 27, wherein 
said oligonucleotides have more than one codon encoding 
an amino acid capable of forming a covalent bond. 
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29. A cloning system for expressing 
oligonucleotides needing random, soluble peptides having 
constrained secondary structure in solution, said 
oligonucleotides being generated from diverse populations 

5 of. combined first and second precursor oligonucleotides 
each or either having a desirable bias of random codon 
sequences, comprising: a set of first vectors having a 
desirable bias of random codon sequences and a second set 
of vectors having a diverse population of second 

10 precursor oligonucleotides having a desircdale bias of 
random codon sequences, said first and second vectors 
each having a pair of restriction sites so as to allow 
the operational combination of said oligonucleotides into 
a contiguous oligonucleotide encoding said soluble 

15 peptide having constrained secondary structure in 
solution. 

30. The composition of claim 29, wherein said 
first or second precursor oligonucleotides are biased. 

31. The composition of claim 29, wherein said 
first and second precursor oligonucleotides are biased. 

32. The cloning system of claim 29, wherein 
said first or second precursor oligonucleotides have more 
than one codon encoding an amino acid capable of forming 
a covalent bond. 
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33. Th cloning system of claim 29, wherein 
said first and second precursor oligonucleotid s have at 
least one codon encoding an amino acid capable of forming 
a covalent bond. 

34. The cloning system of claim 32 or 33, 
wherein said amino acid is an amino acid selected from 
the group consisting of cysteine, glutamic acid, lysine, 
leucine or tyrosine. 

35. The cloning system of claim 29, wherein 
said combined first and second vectors is through a pair 
of restriction sites. 

36. The cloning system of claim 29, wherein 
said expressible oligonucleotides are expressed as 
peptide fusion proteins on the surface of a filamentous 
bacteriophage . 

37. A vector comprising an oligonucleotide, 
said oligonucleotide having a desirable bias of random 
codon sequences, and more than one codon encoding an 
amino acid capsdsle of forming a covalent bond. 

38. A vector of claim 37, wherein said amino 
acid is an amino acid sel cted from th group c nsisting 
of cy stein , glutamic acid, lysine, leucine or tyrosine. 
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39* An isolated, soluble peptide having a 
constrain d s condary structure in solution. 

40. An expressible oligonucleotide produced by 
the cloning system of claim 29 « 

41. A host cell containing the cloning system 
of claim 29 • 

42. A host cell containing the vector of claim 

38. 

43. A method of isolating a soluble peptide 
having a constrained secondary structure in solution, 
which comprises growing said host cell of claim 41 or 42 
under suited^le conditions favoring expression of said 

5 peptide, and isolating said peptide so produced. 

44. A method of constructing a diverse 
population of vectors containing combined first and 
second precursor oligonucleotides, wherein each or either 
precursor oligonucleotides has a desirable bias of random 

5 codon sequences, and capable of expressing said combined 
oligonucleotides as random, soluble peptides having 
constrained secondary structure in solution, comprising 
the steps of: 
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(a) operationally linking s quences from a 
diverse population of first precursor 
oligonucleotides having a desirable bias 
of random codon sequences to a first 

5 vector; 

(b) operationally linking sequences from a 
diverse population of second precursor 
oligonucleotides having a desirable bias 
of random codon sequences to a second 

10 vector; 

(c) wherein said first or second, or first and 
second precursor oligonucleotides have at 
least one codon capcd;>le of forming a 
covalent bond, 

15 (d) combining the vector products of steps (a) 

and (b) under conditions where said 
populations of first and second precursor 
oligonucleotides are joined together into 
a population of combined vectors caped^le 

20 of being expressed. 



45. The method of claim 44, wherein said amino 
acid is an amino acid selected from the group consisting 
of cyst ine, glutamic acid, lysine, leucine or tyrosine. 
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46. The method of claim 44, wherein st ps (a) 
through (d) ar r p ated two or more times. 
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I 10 
AATGCTACTA 
ATA6CTAAAC 
CGTTCGCAGA 
6TTGCATATT 
TCTGCAAAAA 
TT6GAGTTTG 
TCTTTCGGGC 
CAGGGTAAAG 
TTTGAGGGGG 
AAACATTHA 
GGnniATC 
AAnCCTTTT 
ATGAATCTTT 
TCTTCCCAAC 
CAATGATTAA 
CTCGTCAGGG 
AATATCC6GT 
TGTACACCGT 
GTCTGCGCCT 
CAG6CGATGA 
CAAAGATGA6 
GTGGCATTAC 
CAAAGCCTCT 
CGATCCCGCA 
TGCGTGGGCG 
ATTCACCTCG 
TTTTTGGAGA 
TAHCTCACT 
TTTACTAACG 
CTGT6GAATG 
TGGGTTCCTA 
TCTGAGGGTG 
AHCCGGGCT 
AACCCCGCTA 
CAGAATAATA 
CAAGGCACTG 
TATGACGCn 
6ATCCATTC6 
6CTGGCGGCG 
GGCGGTTCTG 
6ATTTTGATT 
GAAAACGCGC 
GCTGCTATCG 
GGTGATITTG 
TTAATGAATA 
TTTGTCTnA 
nCCGTGGTG 
TTT6CTAACA 
TAnAHGCG 
TTAAAAAGGG 
GGCTTAACTC 
TTGHCAGGG 
TCTCT6TAAA 
ATTGGGATAA 
CTCGTTAGCG 
CCTGATTTAA 
CTTAGAATAC 
TCCTACGATG 
ACCCGHCTT 
AAATTAGGAT 
CGnCTGCAT 
TTTGTCGGTA 
GTTGGCGHG 
ACTGGTAAGA 



I 20 
CTATTAGTAG 
AGGTTATTGA 
ATTGGGAATC 
TAAAACATGT 
TGACCTCTTA 
CTTCC6GTCT 
TTCCTCTTAA 
ACCTGATTTT 
AHCAATGAA 
CTATTACCCC 
GTCGTCTGGT 
GGCGTTATGT 
CTACCT6TAA 
GTCCTGACT6 
AGHGAAATT 
CAAGCCTTAT 
TCTTGTCAAG 
TCATCTGTCC 
CGTTCC6GCT 
TACAAATCTC 
TGTTTTAGTG 
GTATTTTACC 
GTAGCCGTTG 
AAAGCGGCCT 
ATGGTTGTTG 
AAA6CAA6CT 
TTTTCAACGT 
CCGCT6AAAC 
TCTGGAAAGA 
CTACAGGC6T 
TTGGGCTTGC 
GCGGTTCTGA 
ATACTTATAT 
ATCCTAATCC 
GGTTCCGAAA 
ACCCCGTTAA 
ACTGGAACG6 
TTTGTGAATA 
GCTCTGGTGG 
AGGGTGGCGG 
ATGAAAAGAT 
TACA6TCTGA 
ATGGTTTCAT 
CT6GCTCTAA 
ATTTCC6TCA 
GCGCTGGTAA 
TCTTTGCGTT 
TACT6C6TAA 
mCCTCGGT 
CTTCGGTAAG 
AATTCTTGTG 
T6TTCAGTTA 
GGCT6CTATT 
ATAATAT6GC 
TTGGTAAGAT 
GGCTTCAAAA 
CGGATAAGCC 
AAAATAAAAA 
G6AATGATAA 
GGGATATTAT 
TA6CTGAACA 
CHTATATTC 
TTAAATATGG 
ATTT6TATAA 



i 30 
AAHGATGCC 
CCATTTGCGA 
AACTGHACA 
T6AGCTACAG 
TCAAAAGGAG 
GGHCGCTTT 
TCTTTnGAT 
TGATTTATGG 
TATTTATGAC 
CTCTGGCAAA 
AAACGAGGGT 
ATCTGCATTA 
TAATGHGTI 
GTATAATGAG 
AAACCATCTC 
TCACTGAATG 
ATTACTCTTG 
TCTTTCAAAG 
AAGTAACATG 
CGHGTACTT 
TATTCTTTCG 
CGITTAATGG 
CTACCCTCGT 
TTAACTCCCT 
TCATTGTCGG 
6ATAAACCGA 
GAAAAAAHA 
TGHGAAAGT 
CGACAAAACT 
TGTACnTGT 
TATCCCTGAA 
GGGTGGCG6T 
CAACCCTCTC 
TTCTCnGAG 
TAG6CAGGGG 
AACTTATTAC 
TAAATTCAGA 
TCAAGGCCAA 
TGGnCTGGT 
CTCTGAG6GA 
6GCAAACGCT 
CGCTAAAGGC 
TG6TGACGTT 
TTCCCAAATG 
ATAHTACCT 



ACCA 
TC 



AT6AA 
"ATAT 



TAAGGAGTCT 
nCCTTCTGG 
ATAGCTAHG 
GGHATCTCT 
ATTCTCCCGT 
nCATTTnG 
TGTTTATTrr 
TCAGGATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGCTTGCn 
GGAAAGACAG 
CTTCCnGTT 
TCTTGTTTAT 
TCTTATTACT 
CGATTCKAA 
CGCATATGAT 



i 40 
ACCTTTTCAG 
AATGTATCTA 
TGGAAT6AAA 
CACCAGAHC 
CAATTAAAGG 
GAAGCTCGAA 
GCAATCCGCT 
TCATTCTCGT 
GAHCCGCAG 
ACnCTTTTG 
TATGATAGTG 
6TTGAATGTG 
CCGTTAGnC 
CCAGTTCnA 
AA6CCCAATT 
AGCAGCTTTG 
ATGAAGGTCA 
TTGGTCAGTT 
GAGCAGGTCG 
TGHTCGCGC 
CCTCTTTCGT 
AAACnCCTC 
TCCGATGCTG 
GCAAGCCTCA 
CGCAACTATC 
TACAAHAAA 
TTATTCGCAA 
TGTTTAGCAA 
TTAGATCGTT 
ACTGGTGACG 
AATGAG6GTG 
ACTAAACCTC 
GACGGCACTT 
GA6TCTCAGC 
GCAHAACTG 
CAGTACACTC 
GACTGCGCn 
TC6TCTGACC 
GGCGGCTCT6 
GGCGGTTCCG 
AATAAGGGGG 
AAACHGATT 
TCCGGCCTTG 
GCTCAAGTCG 
TCCCTCCCTC 
TTTTCTATTG 
GHGCCACCT 
TAATCATGCC 
TAACniGTT 
CTATHCATT 
CTGATATTAG 
CTAATGCGCT 
ACGTTAAACA 
6TAACTGGCA 
ATTGTAGCTG 
GTCGGGAGGT 
GATTTGCHG 
GHCTCGATG 
CCGATTATT6 
CAGGACHAT 
TGTCGTCGTC 
GGCTCGAAAA 
TTAAGCCCTA 
ACTAAACAGG 



! 50 
CTCGCGCCCC 
ATGGTCAAAC 
CTTCCAGACA 
AGCAAHAAG 
TACTCTCTAA 
TTAAAACGCG 
TTGCTTCTGA 
TTTCTGAACT 
TATTG6ACGC 
CAAAAGCCTC 
TTGCTCHAC 
GTATTCCTAA 
GHTTATTAA 
AAATCGCATA 
TACTACTCGT 
TTACGTTGAT 
GCCAGCCTAT 
CGGTTCCCTT 
CGGATTTCGA 
TTG6TATAAT 
TTTAGGTTGG 
ATGAAAAAGT 
TCnrCGCTG 
GCGACCGAAT 
GGTATCAAGC 

GGCTccrrn 

TTCCTTTAGT 
AACCCCATAC 
ACGCTAACTA 
AAACTCAGTG 
GTGGCTCTGA 
CTGAGTACG6 
ATCCGCCTGG 
CTCHAATAC 
TTTATAC6G6 
CTGTATCATC 
TCCATTCTGG 
TCGGTCAACC 
AGGGTGGTGG 
GTGGTGGCTC 
CTATGACCGA 
CTGTCGCTAC 
CTAATG6TAA 
GTGACG6TGA 
AATCGGHGA 
ATTGTGACAA 
TTATGTATGT 
AGTTCnTTG 
GCCGTATCTG 
GTHCTTGCT 
CGCTCAATTA 
TCCCTGTTn 
AAAAATCGH 
AATTAGGCTC 
GGTGCAAAAT 
TCGCTAAAAC 
CTATTGGGCG 
AGTGCGGTAC 
ATTGGTnCT 
CTATTGTTGA 
TGGACA6AAT 
TGCCTCTGCC 
CTGTTGAGCG 
CTTTTTCTAG 



! 60 
AAATGAAAAT 50 
TAAATCTACT 120 
CCGTACTTTA 180 
CTCTAAGCCA 240 
TCCTGACCTG 300 
ATATTTGAA6 350 
CTATAATAGT 420 
GTHAAAGCA 480 
TATCCAGTCT 540 
TCGCTATTTT 600 
TATGCCTCGT 660 
ATCTCAACTG 720 
CGTAGATHT 780 
AGGTAATTCA 840 
TCTGGTGm 
TTGGGTAATG 
GCGCCTGGTC 

ATGATTGACC 

CACAATTTAT 1140 
CGCTGGGGGT 1200 
TGCCTTCGTA 
CTTTAGTCCT 

CT6AG6GTGA . 

ATATCGGTTA 1440 
TGHTAAGAA 1500 
GGAGCCTTTT 
TGTTCCITTC 

AGAAAAHCA 

TGAGGGTT6T 1740 
TTACGGTACA 1800 
6GGTGGCGGT 1860 
TGATACACCT 1920 
TACTGAGCAA 1980 
nTCATGTTT 2040 
CACTGHACT 2100 
AAAAGCCAT6 2160 
CTTTAATGAA 2220 
TCCTGTCAAT 2280 
CTCTGAGGGT 2340 
TGGTTCCGGT 2400 
AAATGCCGAT 2460 
TGATTACGGT 2520 
TGGTGCTACT 2580 
TAATTCACCT 2540 
AT6TC6CCCT 2700 
AATAAACHA 2760 
ATTTTCTACG 2820 
GGTAHCCGT 2880 
CTTACTTTTC 2940 
CTTAnAHG 3000 
CCCTCTGACT 3060 
TATGHAnC 3120 
TCnATHGG 3180 
TGGAAAGACG 3240 
AGCAACTAAT 3300 
GCCTCGCGTT 3360 
CGGTAATGAT 3420 
nGGHTAAT 3480 
ACATGCTCGT 3540 
TAAACAGGCG 3500 
TACTTTACCT 3550 
TAAATTACAT 3720 
TTGGCTTTAT 3780 
TAATTATGAT 3840 
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6601 ACGATGCGCC fATrTArflrr AArfrSSf? B^K9^F AAACTGGCAG ATGCACGGTT 6600 
5661 CCACGGA6AA TCCGAfGRGT TrfrlrTSJ ^JfPfffl^? GGTCAATCCG CCGTHGHC 6660 
5721 AGGAAGGCCA GACGCGAATT ATTTTTrATr fffftlU^^ KIS^^GAA AGCTGGCTAC 6720 
6781 TTAACAAAAA TTTAACGrfiA ATTTTAAfI? §SIBfT$J JGGHAAAAA ATGAGCTGAT 6780 
6841 TTATACAATC TTCCTGTTTT TfirrrrTTTT fSJffl^f^ III^CAATn AAATAHTGC 68^10 
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, ..^^1, 10 20 i 30 I 40 I 50 I 60 

1 AATGCTACTA CTATTA6TA6 AATTGAT6CC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
51 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AAT6TATCTA ATGGTCAAAC TAAATCTACT 120 
121 C6TTCGCAGA ATT6GGAATC AACTGHACA TG6AAT6AAA CTTCCAGACA CCGTACTHA 180 
181 GTT6CATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 
i^i TGACCTCTTA TCAAAAGGAG CAATTAAAG6 TACTCTCTAA TCCTGACCTG 300 

301 TTGGA6TTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGC6 ATATHGAAG 350 
361 TCTTTCGGGC HCCTCTTAA TCTTTTTGAT GCAATCCGCT HGCTTCTGA CTATAATAGT ^70 
J21 CAGGGTAAAG ACCTGATTH TGATTTATGG TCATTCTCGT THCTGAACT GTHAAAGCA 480 
551 ni^^5§55^ ^n^MI^AA TATTTATGAC GATTCCGCA6 TAHGGACGC TATCCAGTCT 540 
li] fWCATTTTA CTATTACCCC CTCTGGCAAA ACnCHTTG CAAAAGCCTC TCGCTATTTT 600 

Wl §§Hni4K ^J^^KT^^I ^^55^5^1 TAT6ATAGT6 TT6CTCTTAC TATGCCTCGT 650 
661 AATTCCTTTT GGC6TTATGT ATCTGCAHA GTTGAAT6T6 GTATTCCTAA ATCTCAACTG 720 

ll\ ^M^J^TUr ^Wr^WM Jh^l^Tl^U ^^^IMW GTTTTATTAA C6TA6ATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGA6 CCAGTTCnA AAATCGCATA AGGTAAHCA 840 
AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCIGGIGHT 900 
CAA6CCTTAT TCACTGAATG AGCAGC1TTG TTACGTTGAT TTG6GTAATG 960 
4^1^199551 ATTACTCnG ATGAAGGTCA 6CCAGCCTAT GCGCCTGGTC 1020 

}Ril J§J49^9P5I ICATCT6TCC TCTTTCAAAG TT6GTCAGTT CGGTTCCCTT AT6ATT6ACC 1080 
5IPJ599^9I f5JT99£5QI AAGTAACATG 6A6CAGGTCG CGGATTTCGA CACAATTTAT 1140 
U5? P}§§P54J§? TACAAATCTC CGTTGTACn TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
}Wi f4M5^K^5 TGTTTTAGT6 TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
1261 GTGGCATTAC 6TATTTTACC CGTTTAATGG AAACHCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGHG CTACCCTCGT TCC6AT6CTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 
}JS} I^5I55§55 ^T95II§II§ TCATTGTCGG CGCAACTATC G6TATCAA6C TGTTTAAGAA 1500 
4TJ^99I95 AAAGCAA6CT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 
innGGAGA TTTTCAAC6T 6AAAAAATTA HATTCGCAA TTCCTTTA6T TGTTCCTTTC 1620 
IfflS^^^J ^f^PI^M^C TGTTGAAA6T TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
m45IM&5 I&I5§ftS^54 CGACAAAACT TTA6ATCGTT ACGCTAACTA TGAGGGTT6T 1740 
iJSI S5I5§^^J5 CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCA6TG TTACGGTACA 1800 
}§2i K§STJP9l^ JJGGGCTTGC TATCCCTGAA AATGA6G6TG 6TG6CTCTGA GG6TGGCGGT 1860 
i§§} mGAGGGTG 6CG6TTCT6A GGGT66CGGT ACTAAACCTC CTGA6TACG6 T6ATACACCT 1920 
}§H ^Ift^H'^J^J CAACCCTCTC GACGGCACTT ATCCGCCTG6 TACTGAGCAA 1980 

1981 AACCCC6CTA ATCCTAATCC TTCTCHGAG GAGTCTCA6C CTCHAATAC TTTCATGnT 2040 
2041 CA6AATAATA GGTTCCGAAA TAGGCAGGGG 6CATTAACTG TTTATAC66G CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCAT6 2160 
JATGACGCTT ACTG6AAC6G TAAAHCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
2221 6ATCCATTCG JTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTG6CGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 
2341 G6CGGTTCTG A6GGTGGC66 CTCTGAGGGA GGCGGHCCG GTGGTGGCTC IGGHCCGGT 2400 
5^11115^11 4T5^M^GAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
2451 6AAAACGCGC TACAGTCT6A C6CTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 
2521 GCTGCTATC6 ATGGTTTCAT TGGTGACGTT TCCGGCCHG CTAATGGTAA TGGTGCTACT 2580 
W,} WMTlTVi TI?S£AAATG 6CTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

™T6AATA ATTTCC6TCA ATATTTACCT TCCCTCCCTC AATC6GTTGA ATGTCGCCCT 2700 
ni5ICTTTA 6CGCTGGTAA ACCATAT6AA TTTTCTATTG ATT6TGACAA AATAAACTTA 2760 
2761 TTCCGTGGT6 TCTTTGCGTT TCTTTTATAT GHGCCACCT TTATGTATGT ATTTTCTACG 2820 
iU\ III59IAACA TACTGCGTAA TAA66AGTCT TAATCATGCC AGnCTTTTG GGTATTCCGT 2880 

SIT^IT555 JIX59J^99J TI95II?Igg taacthgit cggctatctg cnACTmc 2940 

29^1 JTAAAAAGGG CTTCGGTAAG ATAGCTAHG CTATTTCATT GHTCTTGCT CnAHAHG 3000 
3001 6GCTTAACTC AATTCTT6TG GGTTATCTCT CTGATAHAG CGCTCAAHA CCCTCTGACT 3050 
3061 TT6TTCAGG6 JGTTCAGTTA ATTCTCCCGT GTAATGCGCT TCCCTGnTT TATGHAnC 3120 
3121 TCTCTGTAAA 6GCTGCTATT TTCATTnTTG ACGTTAAACA AAAAATCGTT TCTTATnGG 3180 
3181 ATTGG6ATAA ATAATATGGC TGTTTATTTT 6TAACTGGCA AATTAGGCTC TGGAAAGACG 3240 
3241 CTCGTTAGCG TTG6TAAGAT TTAGGATAAA AHGTAGCTG GGT6CAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA G6CTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
3361 CTTAGAATAC CG6ATAAGCC TTCTATATCT GAHTGCTTG CTATTGGGCG CG6TAATGAT 3420 
3421 TCCTACGAT6 AAAATAAAAA CGGCnGCTT 6TTCTCGATG AGTGCGGTAC TTGGniAAT 3480 
3481 ACCCGTTCTT GGAAT6ATAA GGAAAGACAG CCGATTATTG AnGGTHCT ACATGCTCGT 3540 
3541 AAATTA6GAT 6GGATATTAT CTTCCTTGTT CAGGACTTAT CTATTGnGA TAAACAG6CG 3500 
3601 CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3561 TTT6TCG6TA CTTTATATTC TCTTATTACT GGCTC6AAAA TGCCTCTGCC TAAATTACAT 3720 
3721 6TTG6C6TTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CT6TTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAA6A ATTTGTATAA CGCATAT6AT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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IIS? Kffi^JU JUfllilTT ^^$9?£JT'^'r TTATCACACG GTCGGTATTT CAAACCATTA 3900 
3901 AATTTA6GTC AGAAGATGAA ATTAACTAAA ATATATTTGA AAAAGTTTTC TCGC6TTCTT 3960 
Im} Wr9U&A VM'^JVM Wf^^^^in ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 
SSi? ?5ffl5J^I? TCAGACCTAT GAniTGATA AATTCACTAT TGACTCTTCT 4080 

Sf^SIfI^^ JlflfJ^fJJ lES^KII TTCAAGGATT CTAAGGGAAA ATTAATTAAT iil40 
JiSl SfiSfS^^ ^^^115^$'^ CTCACATATA TTGATTTATG TACTGTTTCC 4200 

J201 ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTn TCTTGATGH 4250 
526I TGTTTCATCA TCTTCTTTTG CTCA6GTAAT T6AAATGAAT AATTCGCCTC TGCGCGATH 4320 
511} IfSfSI^? KUfW^^^ ^^J9^^$£5^ ATCCGTTATT 6TTTCTCCCG ATGTAAAAG6 4380 
;381 TACT6TTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 
44A1 TGTTTTAC6T GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4';no 
J501 TAATCCAAAC AATCA6GATT ATATTGATGA ATTGCCATCA TCT6ATAATC A66AATAT6A 4560 
;561 TGATAATTCC GCTCCTTCTG GT6GTTTCTT TGHCCGCAA AAT6ATAAT6 TTACTCAAAC 4620 
J|21 TTTTAAAATT AATAACGTTC G6GCAAAGGA TTTAATACGA GTT6TC6AAT TGTTT6TAAA 4680 
LJb} ^Ifrfflffl IflfWI^H i^^^W.^Tl ATCTAHGAC G6CTCTAATC TATTAGnGT 4740 
Jflni AA^I^fffSf {W^KffJ JAGATAACCT TCCTCAAnC CTTTCTACTG TT6ATTTGCC 4800 
{801 AACTGACCAG ATATTGATTG AGGGTTT6AT ATTTGAG6TT CAGCAAGGTG ATGCTTTAGA 4860 
Jq§} JHUfKft ^flffil^^S CKAGC6TGG CACTGTTGCA GGCGGTGHA ATACTGACCG 4920 
2qri ^BJJfJflT ffiHS^T^^ TTCGHCGGT ATTTTTAATG GCGATGHn 4980 

Sn§i tS^PSK? ^ffi^S^ffi J-JS^^fell^S AAAATATT6T CT6T6CCACG 5040 

^im TArTrffff^ fIJIfiPSP 1^1^9^611 6GCCAGAATG TCCCnTTAT 5100 

WJi^W, ?J${S§5I5 ffiCJGCCAA TGTAAATAAT CCATTTCAGA CGATTGA6CG 5160 
llo} Trrff ataS S^MJJIfff JPfK?im JCCTGTTGCA AT66CTGGCG GTAATATT6T 5220 
III] TAffSSJffl 5^Si??M?? SSJI^^m $^§II£nCT ACTCA6GCAA 6TGATGTTAT 5280 
lit} rrrxff rfxf {f^JfSIS fflf^^^^^I IMTF§CGT 6ATGGACAGA CTCTTTTACT 5340 
lln] AATrrrTTrS A^rPffflB W4^^PJI9 19^^^^^^^ GGCGTACCGT TCCT6TCTAA 5400 
I2fii ATlfrTrrfr frfSSPSf BIJI^$9JP CCGCTCTGAT TCCAACGAGG AAAGCACGH 5450 
5g61 ATACGTGCTC GTCAAA6CAA CCATA6TACG C6CCCTGTAG CGGCGCAHA A6CGCG6CGG 5520 
5521 GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTT6CCAG CGCCCTAGCG CCCGCTCCH 5580 

rf rfJffffl SfHSHI PJ9§^ACGT TCGCC6GCTT TCCCC6TCAA GCTCTAAATC 5640 
Ifni ATT?^P??Pf IJ7$??5ffi ?^4^JJJ^5I§ CTTTAC6GCA CCTCGACCCC AAAAAACTT6 5700 
5701 ATTTGG6T6A TGGTTCACGT AGTGG6CCAT CGCCCT6ATA GACGGHTTT CGCCCTHGA 5760 

llo} rTAT^^fSf f^fSJSH J^J^T^§^5 ICTTGTTCCA AACTGGAACA ACACTCAACC 5820 

lU] fSronjJ ?f Jllffl^^ 5§^IIFGCC GATTTCG6AA CCACCATCAA 5880 

III} rr^rfrJJrr f^ffSfl?? 5?£^^05QAG CGTG6ACCGC HGCTGCAAC TCTCTCAGG6 5940 

Inm fff^^f^5I? 5f^?5SWJf J^?I$n99P CGTCTC6CTG GTGAAAAGAA AAACCACCCT 6000 

Inci Arf^fffJS ^P^ff}??P^ 99IPJP£££5 CGCGnGGCC GAHCATTAA T6CAGCTGGC 6060 

6061 ACGACAGGTT TCCCGACTGG AAAGC6GGCA 6TGAGCGCAA CGCAAHAAT GTGAGTTA6C 6120 

6121 TCACTCATTA GGCACCCCA6 GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 

M] HS^fSf^p fKffiffiT ^43fP59^^ ^^^Sft^'J^TC ATAATGAAAT ACCTATTGCC 6240 

ciS} J}P^P9^^P5 ^PT^^^S^J BIIftST?§C TGCCCAACCA GCCATGGCCG AGCTCGTGAT 6300 



6301 GACCCA6ACT CCAGAATTCC ATCCGGAA 
6361 ACT6GCCGTC GTTTTACAAC 6TCGTGAC 
6421 CCTTGCAGCA CACCCCCCH TC6CCAGC 



,6 ACTGHAATT CTAGAAC6CG TAA6CTTGGC 6360 
G G6AAAACCCT GGCGTTACCC AACTTAATCG 6420 
G GC6TAATAGC GAAGAGGCCC GCACCGATCG 6480 



til} SfffifPf^f ff^JIffPf* TTTGCCTGGt ftcCGGCACC 6540 

film f^ff^FSSI^ ?f^^{f S^fJ §?PJ?§4?J^ 9^^J^n?£I GAGGCCGATA CGGTCGTCGT 6600 

fSffiWP I5?f^?JIP? ^^^^£55 TGCGCCCATC TACACCAACG TAACCTATCC 6550 

6661 CATTACGGTC AATCC6CCGT TTGTTCCCAC GGAGAATCCG TCGGGHGIT ACTCGCTCAC 6720 

fi7Si TrrTATTrff ^AlSJSJfH PSH?4^^^ CGAATTATTT TTGATGGCGT 5780 

caSi TTAArB^I SS^^nj^^ ^S'J^MII^A AC GCGAA HT TAACAAAATA 5840 

Aoni HWf^UJf ffHIJfMI ^TH^flT^I ACAATCnCC TGTTTTTGGG GCTTTTCTGA 6900 

6901 TTATCAACCG G66TACATAT GATTGACATG CTA6TTTTAC GAnACCGTT CATCGATTCT 6960 

TTo} rJWrrrTrl fff^fffilf ???fifil^^^ TTGTAGATCT CTCAAAAATA 7020 

7801 TTrAfffTrT ffrrffTBf A6AACGGTTG AATATCATAT TGATGGTGAT 7080 

7?§i rrATTTAAAA ff^f^PIfff Kfffflin GAATCTTTAC CTACACAHA CTCAGGCAH 7140 
79m rffrrAAAAr tattI^J^?^ miftl^^^^^ GCGTTGAAAT AAAGGCTTCT 7200 

7?ci ffrrfrTr}? rffiffiS^^^ BATAATGTT TTTGGTACAA CCGAHTAGC TTTATGCTCT 7260 
7261 GAGGCnTAT TGCTTAATTT TGCTAATTCT TTGCCTTGCC IGTATGAHT AHGGACGn 7320 
I 10 I 20 I 30 1 40 ! 50 I 60 
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i 10 
1 AAT6CTACTA 
61 ATAGCTAAAC 
121 CenCGCAGA 
181 GHGCATATT 



2ti 
30 



361 TCTTTCGGGC 



i\2 
48 
5H 
50 
66 



721 ATGAATCTTT 

781 TCTTCCCAAC 

m CAAT6ATTAA 

901 CTCGTCAG6G 

961 AATATCCG6T 

1021 T6TACACCGT 

1081 GTCTGCGCCT 



11^1 
120 



1261 GTGGCAHAC 



132 
138 

m 

150 
156 
162 
168 
174 
180 
186 
192 
198 
204 



216 



228 



240 
246 
2521 
258 
264 
270 
276 
282 
288 
294 
300 
306 
312 
318 
324 
330 
3361 
342 
348 
354 
360 
366 
372 



TCTGCAAAAA 
TTGGAGTnG 



CAGG6TAAAG 
TTTGAGGG6G 
AA ACATT TTA 
GGTnTTATC 
AATTCCTT 



CAGGCGATGA 
CAAAGATGAG 



CAAAGCCTCT 
C6ATCCCGCA 
T6CGTGGGCG 
ATTCACCTCG 
rnTTGGAGA 
TAHCTCACT 
TTTACTAACG 
CT6TGGAATG 
TGG6TTCCTA 
TCTGAGGGTG 
AHCCGGGCT 
AACCCCGCTA 
CAGAATAATA 



2101 CAAGGCACTG 



TATGACGCn 



2221 GATCCAHCG 



GCTGGCG6CG 



2341 G6CGGTTCTG 



GATHTGAn 
GAAAACGCGC 
GCTGCTATCG 
GGTGATTTTG 
HAATGAATA 
TTTGTCnTA 
nCCGTGGTG 
TTTGCTAACA 
TATTATTGCG 
TTAAAAAGGG 
GGCHAACTC 
TTGHCAGGG 
TCTCTGTAAA 
AnGGGATAA 
CTCGTTAGCG 
CHGATHAA 
CHAGAATAC 
TCCTAC6ATG 
ACCCGTTCn 
AAAHAGGAT 
CGTTCT6CAT 
THGTCGGTA 
GTTGGCGTTG 



I 20 
CTATTA6TAG 
A6GTTATTGA 
ATTGGGAATC 
TAAAACATGT 
T6ACCTCTTA 
CTTCCGGTCT 
TTCCTCnAA 
ACCTGATTTT 
ATTCAATGAA 
CTATTACCCC 
GTCGTCTGGT 
GGCGHATGT 
CTACCTGTAA 
GTCCTGACTG 
AGTTGAAATT 
CAAGCCTTAT 
TCTTGTCAAG 
TCATCTGTCC 
CGTTCCGGCT 
TACAAATCTC 
TGniTAGTG 
GTATTTTACC 
GTAGCCGTTG 
AAAGCGGCCT 
ATGGTTGTTG 
AAAGCAAGCT 
TTTTCAACGT 
CCGCTGAAAC 
TCTGGAAAGA 
CTACAGGCGT 
HGGGCnGC 
GCGGTTCTGA 
ATACTTATAT 
ATCCTAATCC 
GGTTCCGAAA 
ACCCCGHAA 
ACTGGAACGG 
TTTGTGAATA 
GCTCTGGTGG 
AGGGTGGCGG 
ATGAAAAGAT 
TACAGTCTGA 
AIGGHTCAT 
CTGGCTCTAA 
ATTTCCGTCA 
GC6CTGGTAA 
TCTTTGCGn 
TACTGCGTAA 
TTTCCTCGGT 
CTTCGGTAAG 
AATTCTTGTG 
TGTTCAGTTA 
GGCTGCTAH 
ATAATATGGC 
TTGGTAAGAT 
GGCTTCAAAA 
CGGATAAGCC 
AAAATAAAAA 
GGAAT6ATAA 
GGGATAHAT 
TA6CTGAACA 
CTTTATATTC 
TTAAATATGG 



I 30 
AATTGAT6CC 
CCATHGCGA 
AACTGTTACA 
I'GAGCTACAG 
TCAAAAGGAG 
GGTTCGCTTT 
TCTTTTTGAT 
TGATTTAT6G 
TATTTATGAC 
CTCTGGCAAA 
AAACGAGGGT 
ATCTGCATTA 
TAATGTTGTT 
GTATAATGAG 
AAACCATCTC 
TCACTGAATG 
ATTACTCTTG 
TCmCAAAG 
AAGTAACATG 
CGHGTACTT 
TAnCTTTCG 
C6TTTAATGG 
CTACCCTCGT 
TTAACTCCCT 
TCATTGTCGG 
GATAAACCGA 
GAAAAAATTA 
TGHGAAAGT 
CGACAAAACT 
TGTAGTTTGT 
TATCCCTGAA 
GGGTGGCGGT 
CAACCCTCTC 
TTCTCTTGAG 
TAGGCAGGGG 
AACTTATTAC 
TAAAHCAGA 
TCAA6GCCAA 
TGGTTCTGGT 
CTCTGAGGGA 
GGCAAACGCT 
CGCTAAAGGC 
TGGTGACGTT 
TTCCCAAATG 
ATATTTACCT 
ACCATATGAA 
TCTTHATAT 
TAAG6AGTCT 

nccncTGG 

ATAGCTAHG 
GGTTATCTCT 
AHCTCCCGT 
nCATTTTTG 
TGTnATTTT 
TCAGGATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGCTT6CTT 
6GAAAGACAG 
TTTTCTTGTT 
TGTTGnTAT 
TCTTATTACT 
CGATTCTCAA 



! 40 
ACCTTTTCA6 
AATGTATCTA 
TG6AATGAAA 
CACCAGATTC 
CAATTAAAG6 
GAAGCTCGAA 
6CAATCCGCT 
TCATTCTC6T 
6ATTCCGCAG 
ACTTCTTTTG 
TATGATAGTG 
GTTGAATGTG 
CCGTTAGTTC 
CCAGTTCnA 
AAGCCCAATT 
AGCAGCTHG 
ATGAAGGTCA 
TT6GTCAGTT 
GAGCAGGTCG 
TGTTTCGCGC 
CCTCTTTCGT 
AAACTTCCTC 
TCCGATGCTG 
GCAAGCCTCA 
CGCAACTATC 
TACAATTAAA 
TTATTCGCAA 
TGTTTAGCAA 
nAGATCGTT 
ACTGGT6ACG 
AATGAGGGT6 
ACTAAACCTC 
GACGGCACTT 
GAGTCTCAGC 
GCATTAACTG 
CAGTACACTC 
GACTGCGCTT 
TCGTCTGACC 
GGCGGCTCTG 
GGCGGHCCG 
AATAAGGGGG 
AAACHGATT 
TCCGGCCTTG 
GCTCAAGTCG 
TCCCTCCCTC 
TITTCTATTG 
GHGCCACCT 
TAATCATGCC 
TAACTHGIT 
CTATTTCATT 
CTGATATTAG 
CTAATGCGCT 
ACGTTAAACA 
6TAACTGGCA 
AHGTAGCTG 
GTCGGGAGGT 
GAHTGCTTG 
GTTCTCGATG 
CCGATTATTG 
CAGGACTTAT 
TGTCGTCGTC 
6GCTC6AAAA 
TTAAGCCCTA 



50 

CTCGCGCCCC 
AT6GTCAAAC 
CTTCCAGACA 
AGCAATTAAG 
TACTCTCTAA 
TTAAAACGCG 
TTGCTTCTGA 
TTTCTGAACT 
TATTGGACGC 
CAAAAGCCTC 
TT6CTCTTAC 
GTATTCCTAA 
GTTnATTAA 
AAATCGCATA 
TACTACTCGT 
TTACGHGAT 
GCCAGCCTAT 
CGGTTCCCTT 
CGGATTTCGA 
HGGTATAAT 
niAGGTTGG 
ATGAAAAAGT 
TCTHCGCTG 
GCGACCGAAT 
GGTATCAAGC 
GGCTCCTTTT 
nCCITTAGT 
AACCCCATAC 
ACGCTAACTA 
AAACTCAGTG 
GTGGCTCTGA 
CTGAGTACGG 
ATCCGCCTGG 
CTCHAATAC 
TTTATACGGG 
CTGTATCATC 
TCCATTCTG6 
TGCCTCAACC 
AGGGTGGTGG 
GTGGTGGCTC 
CTATGACCGA 
CTGTCGCTAC 
CTAATGGTAA 
GTGACGGTGA 
AATCGGTTGA 
ATTGTGACAA 
TTATGTATGT 
AGTTCnTTG 
CGGCTATCTG 
GITTCTTGCT 
CGCTCAAHA 
TCCCTGTTTT 
AAAAATC6TT 
AAHAGGCTC 
GGTGCAAAAT 
TCGCTAAAAC 
CTATTGGGCG 
AGTGCGGTAC 
ATTGGnTCT 
CTAnGHGA 
TGGACAGAAT 
TGCCTCTGCC 
CTGHGAGCG 



i 50 
AAATGAAAAT 50 
TAAATCTACT 120 
CCGTACTTTA 180 
CTCTAAGCCA 240 
TCCTGACCTG 300 
ATATTTGAAG 350 
CTATAATAGT 
GTTTAAAGCA 



420 
480 



TATCCAGTCT 540 
TCGCTATTTT 500 
TATGCCTCGT 550 
ATCTCAACTG 720 
CGTAGATTH 780 
AGGTAATTCA 840 

TCTGGTGrrr 900 

TTGGGTAATG 960 
GCGCCTGGTC 1020 
ATGATTGACC 1080 
CACAATTTAT 1140 
CGCTGGGGGT 1200 
TGCCTTCGTA 1260 
CHTAGTCCT 1320 
CTGAGGGTGA 1380 
ATATCGGTTA 1440 
TGHTAAGAA 1500 
GGAGCCTTTT 1550 
TGTTCCTTTC 1520 
AGAAAAHCA 1680 
TGAGGGHGT 1740 
TTACGGTACA 1800 
GGGTGGCGGT 1860 
TGATACACCT 1920 
TACTGA6CAA 1980 
TnCATGTTT 2040 
CACTGTTACT 2100 
AAAAGCCATG 2160 
CnTAATGAA 2220 
TCCTGTCAAT 2280 
CTCTGAGGGT 2340 
TGGTTCCGGT 2400 
AAATGCCGAT 2460 
TGAHACGGT 2520 
TGGTGCTACT 2580 
TAATTCACCT 2640 
ATGTCGCCCT 2700 
AATAAACTTA 2760 
ATTTTCTACG 2820 
GGTAHCCGT 2880 

cnAcnrrc 2940 

CTTATTATTG 3000 
CCCTCTGACT 3060 
TATGHATTC 3120 
TCTTATTTGG 3180 
TGGAAAGACG 3240 
AGCAACTAAT 3300 
GCCTCGCGTT 3360 
CGGTAATGAT 3420 
TTGGTTTAAT 3480 
ACATGCTCGT 3540 
TAAACAGGC6 3600 
TACHTACCT 3660 
TAAATTACAT 3720 
TTG6CTTTAT 3780 
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ACTGGTAAGA 
TCCGGTGTTT 
AATTTAGGTC 
TGTCTT6CGA 
6AG6TTAAAA 
CAGCGTCTTA 
AGCGACGATT 
ATTAAAAAAG 
TGHTCATCA 
TGTAACTTGG 
TACTGHACT 
TGniTACGT 
TAATCCAAAC 
TGATAAHCC 
TTTTAAAATT 
GTCTAATACT 
TAGTGCACCT 
AACTGACCAG 
TTTTTCATTT 
CCTCACCTCT 
AGGGCTATCA 
TAHCTTACG 
TACTG6TCGT 
TCAAAATGTA 
TCTGGATATT 
TACTAATCAA 
CGGT6GCCTC 
AATCCCHTA 
ATACGTGCTC 
GTGTGGTGGT 
TCGCTTTCTT 
6GGGGCTCCC 
ATTTGGGTGA 
CGnGGAGTC 
CTATCTCGGG 
ACAGGATTTT 
CCAGGCGGTG 
GGCGCCCAAT 
ACGACAG6TT 
TCACTCAHA 
TTGTGAGCGG 
6TGACTGGGA 
AA6CACTATT 
CGCCCAGGTC 
CTAGGCTGAA 
TGAGTACAH 
TAAATTATTC 
GATCGCCCTT 
GCACCAGAAG 
GTCGTCCCCT 
TATCCCATTA 
CTCACATTTA 
GGCGTTCCTA 
AAATAHAAC 
TCTGATTATC 
ATTCTCTTGT 
AAATAGCTAC 
GTGATTTGAC 
GCATTGCATT 
CTTCTCCCGC 
GCTCTGAGGC 
ACGTT 
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ATHGTATAA 
AHCTTATTT 
AGAAGATGAA 
TTGGATTTGC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
GTAATTCAAA 
TCTTCTTTTG 
TATTCAAAGC 
GTATATTCAT 
GCTAATAATT 
AATCAGGATT 
GCTCCTTCTG 
AATAAC6TTC 
TCTAAATCCT 
AAAGATATTT 
ATATTGATTG 
GCTGCTG6CT 
GHTTATCTT 
GTTCGCGCAT 
CTHCAGGTC 
GT6ACT66TG 
GGTATTTCCA 
ACCAGCAAG6 
AGAAGTATTG 
ACTGATTATA 
ATCGGCCTCC 
GTCAAAGCAA 
TACGCGCAGC 
CCCTTCCTTT 
nTAGGGnC 
TGGnCACGT 
CAC6TTCTTT 
CTATTCTTTT 
CGCCTGCTGG 
AA6GGCAATC 
AC6CAAACCG 
TCCCGACTGG 
GGCACCCCAG 
ATAACAATTT 
AAACCCTGGC 
GCACTGGCAC 
CA6CTGCTCG 
GGCGATGACC 
GGCTAC6CTT 
AAAAAGTTTA 
CCCAACA6TT 
C6GTGCC6GA 
CAAACTGGCA 
CGGTCAATCC 
ATGTTGATGA 
TTGGHAAAA 
GTTTACAATT 
AACCGGGGTA 
TT6CTCCAGA 
CCTCTCCGGC 
TGTCTCCGGC 
TAAAATATAT 
AAAAGTATTA 
TTTATTGCTT 
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CGCATATGAT 
AAC6CCTTAT 
GCTTACTAAA 
ATCAGCATH 
TCAGACCTAT 
TCGCTATGH 
AGGTTATTCA 
TGAAATTGH 
CTCAGGTAAT 
AATCAGGCGA 
CTGACGTTAA 
TT6ATATGGT 
ATATTGAT6A 
GTGGTTTCTT 
G6GCAAAGGA 
CAAATGTATT 
TAGATAACCT 
AGGGTTTGAT 
CTCAGC6TGG 
CTGCTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 
AATCTGCCAA 
TGAGCGHTT 
CCGATAGTH 
CTACAACGGT 
AAAACACTTC 
TGTTTAGCTC 
CCATAGTACG 
GT6ACC6CTA 
CTCGCCACGT 
CGATTTAGT6 
AGTGGGCCAT 
AATAGTGGAC 
GATTTATAA6 
GGCAAACCAG 
A6CTGTTGCC 
CCTCTCCCCG 
AAAGCGGGCA 
GCmACACT 
CACACGCGTC 
GHACCCAAG 
TCTTACCGH 
A6TCAGGCCT 
CT6CTAAGGC 
GGGCTATGGT 
CGAGCAAGGC 
GCGCAGCCTG 
AAGCTGGCT6 
GATGCAC6GT 
GCCGTTTGn 
AAGCTGGCTA 
AATGAGCT6A 
TAAATATTTG 
CATATGATTG 
CTCTCAGGCA 
AnAATHAT 
CTTTCTCACC 
6AGGGTTCTA 
CAGGGTCATA 
AATTTTGCTA 
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ACTAAACAGG 
TTATCACACG 
ATATATHGA 
ACATATAGTT 
GATHTGATA 
TTCAAGGATT 
CTCACATATA 
AAATGTAATT 
TGAAAT6AAT 
ATCCGHATT 
ACCTGAAAAT 
TGGnCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATAC6A 
ATCTAHGAC 
TCCTCAATTC 
ATHGAGGIT 
CACTGHGCA 
TTCGHCGGT 
TAGCCATTCA 
TATCTCTGH 
TGTAAATAAT 
TCCTGHGCA 
GAGTTCnCT 
TAAHTGCGT 
TCAA6ATTCT 
CCGCTCTGAT 
CGCCCTGTAG 
CACTTGCCAG 
TCGCCGGCTT 
CHTACGGCA 
CGCCCT6ATA 
TCTTGHCCA 
GGATTnGCC 
CGTGGACCGC 
CGTCTCGCTG 
CGCGTT6GCC 
GTGAGCGCAA 
HATGCTTCC 
ACTTGGCACT 
CTTTGTACAT 
ACCGTTACTG 
ATTGTGCCCA 
TGCATTCAAT 
AGTAGTTATA 
TTCTTAAGCA 
AATG6C6AAT 
GAGTGCGATC 
TACGATGCGC 
CCCACGGAGA 
CAGGAA6GCC 
THAACAAAA 
CHATACAAT 
ACATGCTAGT 
ATGACCTGAT 
CAGCTAGAAC 
CTTTTGAATC 
AAAATTTTTA 
ATGTTTTTGG 
ATTCTTTGCC 



40 



CTTTTTCTAG 
GTCGGTATTT 
AAAAGTTTTC 
ATATAACCCA 
AATTCACTAT 
CTAAG6GAAA 
TTGATTTATG 
AATTTTGTTT 
AAHCGCCTC 
GTTTCTCCCG 
CTACGCAATT 
CCTTCCATAA 
TCTGATAATC 
AATGATAATG 
GTTGTCGAAT 
GGCTCTAATC 
CTHCTACTG 
CAGCAAGGTG 
GGCGGT6TTA 
ATTTTTAATG 
AAAATATTGT 
GGCCAGAATG 
CCAHTCAGA 
AT6GCTGGCG 
ACTCAGGCAA 
GATGGACAGA 
GGC6TACCGT 
TCCAACGAG6 
CGGCGCATTA 
CGCCCTAGCG 
TCCCC6TCAA 
CCTCGACCCC 
GACGGniTT 
AACTG6AACA 
GATTTCGGAA 
TTGCT6CAAC 
GTGAAAAGAA 
GAHCATTAA 
CGCAATTAAT 
GGCTCGTATG 
GGCCGTCGH 
GGAGAAAATA 
TTTACCCCTG 
GG6GATTGTA 
A6TTTACA6G 
GTT6GTGCTA 
ATAGC6AA6A 
GGCGCTTTGC 
TTCCTGAGGC 
CCATCTACAC 
ATCCGACG6G 
AGACGC6AAT 
ATTTAACGCG 
CnCCTGTTT 
TTTACGATTA 
AGCCTHGTA 
GGHGAATAT 
TTTACCTAGA 
TCCTT6CGTT 
TACAACCGAT 
TTGCCTGTAT 
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TAATTATGAT 
CAAACCATT^ 
AC6CGTTCTT 
ACCTAAGCCG 
TGACTCTTCT 
ATTAATTAAT 
TACTGHTCC 
TCTTGAT6TT 
TGCGCGATTT 
ATGTAAAAGG 
TCTTTATTTC 
TTCAGAA6TA 
AGGAATATGA 
TTACTCAAAC 
TGTTTGTAAA 
TATTAGTTGT 
TTGATHGCC 
ATGCTTTAGA 
ATACTGACC6 
GCGATGTTTT 
CTGTGCCACG 
TCCCTHTAT 
CGATTGAGCG 
GTAATATTGT 
GTGATGTTAT 
CTCTTTTACT 
TCCTGTCTAA 
AAAGCACGH 
AGC6CGGCGG 
CCCGCTCCn 
GCTCTAAATC 
AAAAAACHG 
CGCCCHTGA 
ACACTCAACC 
CCACCATCAA 
TCTCTCAGGG 
AAACCACCCT 
TGCAGCTGGC 
GTGAGHAGC 
TTGT6TG6AA 
TTACAACGTC 
AAGTGAAACA 
TGACAAAA6C 
CTAGTGGATC 
CAAGTGCTAC 
CCATAGGGAT 
6GCCCGCACC 
CTG6TTTCCG 
CGATACGGTC 
CAAC6TAACC 
TTGTTACTCG 
TAnTTTGAT 
AAHTTAACA 
TTGGGGCni 
CCGTTCATCG 
GATCTCTCAA 
CATATTGATG 
CATTACTCAG 
GAAATAAAGG 
TTAGCTTTAT 
GATTTATT6G 
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3840 
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4020 
4080 
4140 
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10 



20 



30 



ao. 



50 



50 



1 AATGCTACTA CTATTAGTAG AATTGATGCC ACCHTTCAG CTC6C6CCCC AAATGAAAAT dO 
51 ATA6CTAAAC AGGHATTGA CCATTT6CGA AAT6TATCTA ATGGTCAAAC JAAATCTACT 120 
121 CGHCGCAGA AHGGGAATC AACTGTTACA TG6AATGAAA CTTCCA6ACA CCGTACTTTA 180 
181 GnGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAA6 CTCTAAGCCA 240 
241 TCT6CAAAAA TGACCTCTTA TCAAAA6GAG CAAHAAAGG TACTCTCTAA TCCT6ACCT6 300 
301 nGGASTTTG CHCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTT6AA6 360 
TCTTTC66GC nCCTCHAA TCTTTTTGAT 6CAATCCGCT nGCTTCTGA CIAIAAJAGT 420 



36 
42; 
48! 
54: 

so: 

66 
72 
78 
84 



CA66GTAAAG ACCTGATTTT TGATTTAT6G TCATTCTC6T TTTCTGAACT GTTTAAAGCA 480 
TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCC6CA6 TATTG6ACGC TATCCA6TCT 540 
AAACATrrTA CTAHACCCC CTCTGGCAAA ACTTCTTTT6 CAAAAGCCTC TC6CTATTTT 500 
GGTTTrTATC GTCGTCTGGT AAACGAGGGT TATGATAGT6 TT6CTCTTAC TATGCCTCGT 660 
AATTCCrrrr GGCGTTATGT ATCTGCATTA GnGAATGTG 6TATTCCTAA ATCTCAACT6 720 
ATGAATCTTT CTACCTGTAA TAATGnGTT CC6TTA6TTC 6TTTTATTAA C6TAGATTTT 780 
TCTTCCCAAC 6TGCTGACTG 6TATAATGAG CCAGTTCTTA AAATC6CATA AGGTAATTCA 840 
CAATGATTAA AGHGAAAH AAACCATCTC AAGCCCAATT TACTACTCGT TCTG6TGTTT 900 
901 CTCGTCA6GG CAAGCCHAT TCACT6AATG AGCAGCTTT6 TTACGTTGAT TTGGGTAATG 960 
961 AATATCCGGT TCHGTCAAG ATTACTCHG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG nGGTCAGTT CG6TTCCCTT ATGATTGACC 1080 
1081 GTCTGC6CCT CGHCCGGCT AAGTAACATG GA6CAGGTCG CGGATTTCGA CACAATTTAT 1140 
1141 CAGGCGAT6A TACAAATCTC CGTTGTACn TGTTTCGCGC TTG6TATAAT CGCTG6GGGT 1200 
1201 CAAAGATGAG TGTTTTAGT6 TAnCTTTCG CCTCTTTCGT TTTAG6TTGG TGCCTTCGTA 1250 
1251 GTGGCATTAC GTATTTTACC CGHTAATGG AAACTTCCTC ATGAAAAAGT CTTTA6TCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGAT6CTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCC6CA AAAGCGGCCT HAACTCCCT GCAAGCCTCA 6CGACCGAAT ATATCG6TTA 1440 
1441 TGC6TGGGCG ATGGHGnG TCATTGTCGG CGCAACTATC 6GTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAA6CAAGCT GATAAACCGA TACAAHAAA GGCTCCTTTT G6AGCCTTTT 1550 
1551 TTTTT6GAGA TTTTCAACGT GAAAAAAHA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1520 
1621 TATTCTCACT CC6CT6AAAC TGHGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1580 
1581 TTTACTAACG TCTGGAAAGA CGACAAAACT TTA6ATCGTT ACGCTAACTA TGAG6GTTGT 1740 
1741 CTGTGGAATG CTACAGGCGT IGTAGTHGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 
1801 TGGGnCCTA HGGGCnGC TATCCCTGAA AATGAGGGTG 6TGGCTCTGA GG6TGGCG6T 1860 
1861 TCTGAGGGTG GCGGHCTGA GGGTGGC6GT ACTAAACCTC CT6AGTACGG TGATACACCT 1920 
1921 AHCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACH ATCC6CCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC nCTCHGAG GAGTCTCAGC CTCnAATAC TTT^^I^m 
20ai CAGAATAATA GGHCCGAAA TAGGCA66GG GCAHAACTG TTTATACG6G CACT6TTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTAnAC CAGTACACTC CTGTATCATC AAAA6CCATG 2150 
2161 TATGACGCn ACTGGAACGG TAAATTCAGA GACTGCGCH TCCATTCTG6 CTTTAAT6AA 2220 
2221 GATCCAHCG TTTGTGAATA TCAA6GCCAA TCGTCTGACC TGCCTCAACC JCCTGTCAAT 2280 
2281 GCTGGCGGCG CGTCTGGTGG TGGnCTGGT GGCGGCTCT6 AGGGT6GTG6 CTCT6AGG6T 2340 
23A1 GGCGGHCTG AGGGTGGC6G CTCT6AGG6A GGCGGHCCG 6TG6TGGCTC TG6TTCCGGT 2400 
2401 GATTTTGAn ATGAAAAGAT GGCAAACGCT AATAAGG6GG CTAT6ACCGA AAATGCC6AT 2460 
2461 GAAAACGCGC TACAGTCTGA C6CTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 
2521 GCTGCTATCG ATGGITTCAT TGGTGACGn TCCGGCCHG CTAATGGTAA T6GT6CTACT 2580 
2581 6GTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 
2541 HAATGAATA AmCCGTCA ATATTTACCT TCCCTCCCTC AATCGGHGA ATGTCGCCCT 2700 
2701 rrrGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTAnG AHGTGACAA AATAAACTTA 2760 
2761 TTCCGTGGTG TCTTTGCGTT TCTmATAT GHGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 THGCTAACA TACTGC6TAA TAA6GAGTCT TAATCATGCC AGnCTlTTG GGTATTCCGT 2880 
2881 TAHAnGCG mCCTCGGT TTCCTTCTGG TAACTTTGn CGGCTATCT6 CmCTTTTC 2940 
2941 TTAAAAAGGG CTTCGGTAAG ATAGCTAHG CTATTTCATT GnTCTTGCT CTTATTATT6 3000 
3001 GGCHAACTC AAnCTTGTG G6TTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 
3061 TTGHCAGGG TGnCAGTTA AHCTCCCGT CTAAT6C6CT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTATT nCATTnTG ACGHAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 AHGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAG6CTC TGGAAAGAC6 3240 
3241 CTCGHAGCG HGGTAAGAT TTAG6ATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGG6AGGT TCGCTAAAAC GCCTCGCGTT 3360 
3361 CTTAGAATAC CGGATAAGCC HCTATATCT GATTTGCnG CTAHGGGCG CGGTAAT6AT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCn GHCTCGATG AGTGCGGTAC TTG6TTTAAT 3480 
3481 ACCCGTTCn GGAATGATAA GGAAAGACAG CCGAHAHG AnGGTTTCT ACATGCTC6T 3540 
3541 AAATTAGGAT GGGATATTAT TTTTCnGn CAGGACTTAT CTAnGTTGA TAAACAG6CG 3600 
3601 CGHCTGCAT TAGCTGAACA TGnGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3560 
3661 TTTGTCGGTA CTHATAnC TCTTAnACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GHGGCGnG HAAATATGG CGATTCTCAA TTAAGCCCTA CT6TTGAGC6 TT^GCTTTAT 3780 
3781 ACTGGTAAGA ATHGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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38i|l TCCGGTGni ATTCTTATTT AACGCCTTAT TTATCACAC6 GTCGGTATTT CAAACCATTA 3900 
3901 AATHAGGTC AGAAGATGAA GCTTACTAAA ATATAHTGA AAAA6TTTTC ACGCGTTCTT 3960 
3951 TGTCTTGCGA TTGGATTT6C ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 
4021 6AGGTTAAAA AGGTAGTCTC TCAGACCTAT GATHTGATA AATTCACTAT TGACTCTTCT 4080 
4081 CAGCGTCHA ATCTAAGCTA TCGCTATGTT HCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
4141 AGCGACGAH TACAGAAGCA AGGTTATTCA CTCACATATA TTGAHTATG TACIGTHCC 4200 
4201 AHAAAAAAG GTAATTCAAA TGAAATTGTT AAAT6TAATT AAniTGnT TCTTGATGH 4250 
4251 TGTTTCATCA TCnCTTTTG CTCAGGTAAT TGAAATGAAT AAHCGCCTC TGCGCGATTT 4320 
4321 TGTAACTTGG TAHCAAAGC AATCAGGCGA ATCCGTTATT GTHCTCCCG AT6TAAAAGG 4380 
4381 TACTGHACT GTATAHCAT CTGACGHAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 
4441 TGITTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCHCCATAA TTCAGAAGTA 4500 
4501 TAATCCAAAC AATCAGGATT ATATT6ATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 
4561 TGATAAHCC GCTCCHCTG GTGGTTTCTT TGHCCGCAA AAT6ATAATG TTACTCAAAC 4520 
4621 TTTTAAAAn AATAACGHC 6GGCAAAG6A HTAATACGA GTTGTCGAAT TGTHGTAAA 4580 
4681 GTCTAATACT TCTAAATCCT CAAATGTAH ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 
4741 TAGT6CACCT AAAGATAHT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 
4801 AACTG ACCAG ATATTGAHG AGGGTTTGAT ATTT6AGGTT CAGCAAGGT6 AT6CTTTAGA 4850 
4851 TTTTTCATn GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGT6TTA ATACTGACCG 4920 
4921 CCTCACCTa GTTTTATCTT CTGCTGGTGG nCGTTCGGT ATTTnAATG GCGATGTTn 4980 
4981 AGGGCTATCA GnCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 
5041 TATTCHACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 5100 
5101 TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATHCAGA CGATTGAGCG 5150 
5161 TCAAAATGTA GGTAHTCCA TGAGCGHH TCCTGHGCA ATGGCTG6CG 6TAATATTGT 5220 
5221 TCTGGATAn ACCAGCAAGG CCGATAGHT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 
5281 TACTAATCAA AGAAGTATTG CTACAACGGT TAATTT6CGT GATGGACAGA CTCTTTTACT 5340 
5341 CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT G6CGTACCGT TCCTGTCTAA 5400 
5401 AATCCCTHA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGH 5460 
5461 ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CG6CGCATTA AGCGCGGCG6 5520 
5521 GT6TGGTGGT TACGCGCAGC GTGACCGCTA CACTT6CCAG CGCCCTAGCG CCCGCTCCH 5580 
5581 TCGCTTTCn CCCTTCCTTT CTCGCCAC6T TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 
5641 G6G6GCTCCC TTTAGGGHC CGAHTAGTG CTTTACGGCA CCTC6ACCCC AAAAAACHG 5700 
5701 ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTHH CGCCCHTGA 5760 
5761 CGTT66AGTC CACGTTCTTT AATAGTGGAC TCHGnCCA AACTGGAACA ACACTCAACC 5820 
5821 CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATHCGGAA CCACCATCAA 5880 
5881 ACAGGAnn CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAG6G 5940 
5941 CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG 6TGAAAAGAA AAACCACCCT 6000 
6001 GGCGCCCAAT AC6CAAACCG CCTCTCCCCG CGCGTTGGCC GATTCAHAA TGCA6CTG6C 5060 
6061 ACGACAGGH TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 
5121 TCACTCAHA GGCACCCCAG GCTHACACT HATGCTTCC GGCTCGTATG HGTGTGGAA 6180 
6181 HGTGAGCGG ATAACAATTT CACACGCGTC ACnGGCACT GGCCGTCGTT TTACAACGTC 6240 
6241 GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT GGA6AAAATA AAGTGAAACA 6300 
6301 AAGCACTAH 6CACTGGCAC TCTTACCGTT ACTGTTTACC CCTGTGGCAA AAGCCTATGG 6360 
6361 GGGGTTCATG CTTCTGAGGC ATCCGGGAGC TGAAGGCGAT GACCCTGCTA AGGCTGCAH 5420 
5421 CAATAGTTTA CAGGCAAGTG CTACTGAGTA CATTG6CTAC GCTTG6GCTA TGGTAGTAGT 6480 
6481 TATAGnGGT GCTACCATAG GGATTAAATT AHCAAAAAG TTTACGAGCA AGGCTTCTTA 6540 
6541 AGCAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC AGTTGC6CAG CCTGAATGGC 6500 
6601 GAATGGCGCT TTGCCTGGn TCCGGCACCA GAAGCGGTGC CGGAAAGCTG GCTGGAGTGC 6560 
6661 GATCTTCCT6 AGGCCGATAC GGTCGTC6TC CCCTCAAACT GGCAGATGCA CGGTTACGAT 6720 
6721 GCGCeCATCT ACACCAACGT AACCTATCCC AHACGGTCA ATCCGCCGH TGHCCCACG 6780 
6781 GAGAATCCGA CGGGTTGTTA CTCGCTCACA TnAATGHG ATGAAAGCTG GCTACAGGAA 6840 
6841 G6CCAGACGC GAATTAnTT TGATGGCGH CCTAnGGH AAAAAATGA6 CTGATHAAC 6900 
6901 AAAAATTTAA CGCGAATTTT AACAAAATAT TAACGTHAC AATHAAATA TTTGCTTATA 5950 
6951 CAATCTTCCT GTTTTTGGGG CTTTTCTGAT TATCAACCGG GGTACATAT6 ATTGACATGC 7020 
7021 TAGTITTACG ATTACCGTTC ATCGAHCTC TIGTHGCTC CAGACTCTCA GGCAATGACC 7080 
7081 TGATAGCCTT TGTAGATCTC TCAAAAATAG CTACCCTCTC CGGCAHAAT TTATCAGCTA 7140 
7141 GAACGGTTGA ATATCATATT GATGGTGATT TGACTGTCTC CGGCCTTTCT CACCCTTTTG 7200 
7201 AATCniACC TACACATTAC TCAGGCAHG CATTTAAAAT ATATGAGGGT TCTAAAAATT 7250 
7261 TTTATCCnG CGTTGAAATA AAGGCTTCTC CCGCAAAAGT ATTACAGGGT CATAATGTTT 7320 
7321 TTGGTACAAC CGATTTAGCT TTATGCTCTG AGGCTTTATT GCTTAATTH GCTAATTCTT 7380 
7381 TGCCHGCCT GTATGATTTA nGGACGTT , , 7409 
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I 10 I 20 I 30 I 40 i 50 I 50 „ 
1 A&TGCTACTA CTATTAGTAG AATTGAT6CC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATA6CTAAAC A6GTTATTGA CCATTTGCGA AATGTATCTA AT66TCAAAC JAAATCTACT 120 
121 CGTTDSCAGA ATTGG6AATC AACTGHACA T6GAAT6AAA CTTCCAGACA CC6TACTTTA 180 
181 6TT6CATATT TAAAACATGT TGA6CTACAG CACCAGAHC AGCAATTAAG CTCTAAGCCA 2^0 
241 TCTGCMAAA T6ACCTCTTA TCAAAAG6AG CAATTAAA66 TACTCTCTAA TCCT6ACCTG 300 
301 TTGGA6TTTG CTTCCG6TCT GGTTCGCTTT GAA6CTCGAA TTAAAACGCG ATATTT6AAG 360 
351 TCTTTC6GGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 
421 CAG6GTAAAG ACCTGATTTT T6ATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 
481 TTTGA6S66G ATTCAATGAA TATTTATGAC GATTCC6CA6 TATT6GAC6C TATCCAGTCT 540 
541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTT6 CAAAAGCCTC TCGCTATTTT 600 
501 G6TTTTTATC GTC6TCTGGT AAAC6AG6GT TATGATAGTG TTGCTCTTAC TAT6CCTCGT 660 
661 AATTCCnrr GGCGTTATGT ATCT6CATTA GTTGAATGT6 GTATTCCTAA ATCTCAACTG 720 
791 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC 6TTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC 6TCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA A6GTAATTCA 840 
841 CAATGATTAA AGHGAAAH AAACCATCTC AAGCCCAATT TACTACTC6T TCT66TGTTT 900 
901 CTC6TCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTT6 JTACGTTGAT TT6GGTAAT6 950 
951 AATATCCGGT TCTTGTCAA6 ATTACTCTTG ATGAAGGTCA GCCA6CCTAT GCGCCTG6TC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG nGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGHCCGGCT AAGTAACATG GAGCAGGTC6 CG6ATTTCGA CACAATTTAT 1140 
1141 CAGGCGATGA TACAAATCTC CGHGTACn JGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAA6ATGAG TGHnAGTG TAnCTHCG CCTCTTTCGT TTTAG6TTGG TGCCTTCGTA 1260 
1261 GTGGCATTAC GTATTHACC CGTTTAAT66 AAACTTCCTC ATGAAAAAGT CTTTA6TCCT 1320 
1521 CAAAGCCTCT 6TAGCC6TT6 CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 C6ATCCCGCA AAA6C6GCCT TTAACTCCCT 6CAAGCCTCA 6C6ACCGAAT ATATCGGTTA 440 
1441 TGC6TG6GCG AIGGITGHG TCATTGTCG6 CGCAACTATC GGTATCAAGC JGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT 66AGCCTTTT 1560 
1561 TTTTTG6AGA TTTTCAACGT 6AAAAAATTA TTATTC6CAA TK^JII^SI MISSWS ]lln 
1621 TATTCTCACT CCGCT6AAAC TGTT6AAAGT T6TTTA6CAA AACCCCATAC AGAAAATTCA 1580 
1681 TTTACTAAC6 TCTGGAAAGA CGACAAAACT nAGATCGTT AC6CTAACTA T6A6GGTT6T 1740 
1741 CT6TGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCA6TG TTAC6GTACA 1800 
1801 TGGGTTCCTA TTG66CTT6C TATCCCTGAA AATGA6GGT6 GT66CTCT6A G66T6GC6GT 1860 
1861 TCT6AG6GTG GCGGHCTGA G6GT6GCGGT ACTAAACCTC CTGA6TACGG TGATACACCT 1920 
1921 ATTCCG66CT ATACHATAT CAACCCTCTC 6ACGGCACTT ATCCGCCT6G TACT6AGCAA 1980 
1981 AACCCC6CTA ATCCTAATCC TTCTCTTGAG GA6TCTCA6C CTCTTAATAC TTTCAT6TTT 2040 
2041 CAGAATAATA GGTTCCGAAA TAGGCAGG66 6CATTAACTG TTTATACG6G CACTGTTACT 2100 
2101 CAA66CACTG ACCCC6TTAA AACTTAHAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCn ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCT6G CTTTAATGAA 2220 
9221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCT66CGGCG 6CTCT6GTGG TGGTTCT6GT GGCGGCTCTG A6G6TG6TG6 CTCTGAGG6T 2340 
2341 G6CGGTTCTG AGGGTGGCGG CTCTGA6GGA GGCGGTTCCG GTGDT6GCTC TGGTTCCGGT 2400 
2401 GATTTTGAn ATGAAAAGAT GGCAAACGCT AATAAG6GG6 CTATGACCGA AAATGCCGAT 2460 
2461 GAAAACGCGC TACAGTCT6A CGCTAAAGGC AAACTTGATT CTGTC6CTAC TGATTACGGT 2520 
2521 6CTGCTATC6 ATGGTTTCAT IGGIGACGH TCCG6CCTTG CTAATGGTAA T6GTGCTACT 2580 
2581 GGIGATTHG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 
2641 TTAATGAATA AHTCCGTCA ATAHTACCT TCCCTCCCTC AATC66TTGA ATGTC6CCCT 2700 
2701 TTTGTCTTTA GC6CT6GTAA ACCATATGAA TTTTCTATT6 ATTGT6ACAA {ATAAACTTA 2750 
2761 TTCCGT6GTG TCHTGCGn TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTT6CTAACA TACTGCGTAA TAAGGA6TCT TAATCAT6CC AGTTCTTTTG 66TATTCC6T 2880 
2881 TATTATTGCG HTCCTCGGT nCCTTCTGG TAACmGTT CGGCTATCTG CnACTTTTC 2940 
2941 TTAAAAA6GG CTTCG6TAAG ATAGCTAHG CTATTTCATT 6TTTCTTGCT CTTATTATTG 3000 
3001 GGCTTAACTC AATTCTTGT6 GGTTATCTCT CTGATATTAG C6CTCAATTA CCCTCTGACT |060 
3061 TT6TTCAGGG TGHCAGITA ATTCTCCC6T CTAATGC6CT TCCCT6TTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCATTTTTG ACGHAAACA AAAAATCGTT TCTTATTT6G 3180 
3181 ATTGGGATAA ATAATATGGC TGnTATTTT GTAACTGGCA AATTAG6CTC TGGAAAGACG 3240 
3241 CTC6TTAGCG TTG6TAAGAT TCA6GATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATHAA 66CTTCAAAA CCTCCCGCAA GTC66GA6GT JCGCTAAAAC GCCTCGC6TT 3360 
3361 CTTAGAATAC C66ATAA6CC HCTATATCT GATTT6CTTG CTATT666CG CGGTAAT6AT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCn GTTCTC6ATG AGT6CGGTAC TTGGTTTAAT 3480 
3481 ACCCGTTCTT GGAATGATAA 6GAAAGACAG CCGATTATTG ATT6GTTTCT ACAT6CTCGT 3540 
3541 AAATTAG6AT GG6ATATTAT CTTCCTTGTT CAGGACHAT CTAnGHGA TAAACAGGCG 3600 
3601 C6TTCTGCAT TAGCTGAACA TGnGTTTAT T6TCGTCGTC T6GACAGAAT TACTTTACCT 3660 
3661 TTT6TCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA JGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGTTG HAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTG6TAAGA ATHGTATAA CGCATAT6AT ACTAAACA6G CTTTnCTAG TAATTATGAT 3840 
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GAGGHAAAA AG6TAGTCTC TCAGACCTAT GATTTTGATA 
CAGCGTCTTA ATCTAA6CTA TCGCTATGH HCAAGGAH 

AG6TTATTCA CTCACATATA 

GAAATTGTTA AAT6TAATTA 
CnCTTTTGC TCAGGTAAH GAAATGAATA 
ATTCAAAGCA ATCAGGCGAA TCCGTTATTG 
TATATTCATC TGACGTTAAA CCT6AAAATC 
CTAATAATTT TGATATGGTT CCnCAATTC 
ATCAGGATTA TATTGATGAA TTGCCATCAT 
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AGT6CACCTA AA6ATATTTT A6ATAACCTT CCTCAATTCC 
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AAAAGTTTTC AC6CGTTCTT 
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1851 TnTCATTTG CTGCTGGCTC TCAGC6TGGC ACT6TTGCAG GCGGTGHAA TACTGACCGC 1920 



TGCTGG 
AAA6AC 
GAAGGG 



1921 CTCACCTCTG TTTTATCTTC 
1981 GGGCTATCAG TTCGCGCAn 
5011 ATTCnACGC TTTCA6GTCA 
5101 ACTGGTCGTG TGACTGGTGA 
CAAAAT6TAG GTATTTCCAT 
CIGGATAHA CCAGCAAGGC 
ACTAATCAAA GAAGTAHGC TACAACGGTT 
GGTGGCCTCA CTGAHATAA AAACACTTCT 
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GGT TCGTTCGGTA 
AAT A6CCATTCAA 

TCT ATCTCTGHG 

ATCTGCCAAT GTAAATAATC 

GAGCGmrr cctghgcaa 

CGATAGTTTG AGTTCTTCTA 
AATTTGCGTG 
CAA6ATTCTG 



ATCCCTTTAA TCGGCCTCCT GTTTAGCTCC CGCTCTGATT 
TACGTGCTCG TCAAAGCAAC CATAGTACGC GCCCTGTAGC 



5521 TGTGGTGGn AC6CGCAGCG 
5581 CGCTTTCTTC CCTTCCITTC 
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TGACCGCTAC ACHGCCAGC 
TCGCCACGH CGCCGGCHT 
GATTTAGTGC TTTACGGCAC 
GTGGGCCATC GCCCTGATAG 
ATAGTGGACT CHGnCCAA 
A1TTATAAGG GATTTTGCCG 



TTTGGGTGAT GGHCACGTA 
GnGGAGTCC AC6TTCTTTA 
TATCTC6GGC TATTCTTTTG 
CAGGATmC GCCTGCTGGG 
CAGGCGGTGA AGGGCAATCA 
6CGCCCAATA CGCAAACCGC 
CGACAGGITT CCCGACTGGA 

. CACTGATTAG GCACCCCAGG 

6181 TGTGAGCGGA TAACAATTTC 
6211 GTAGGA6AGC TCGGCGGATC 

" AGITTACAGG CAAGTGCTAC 

GnCGTGCTA CCATAGGGAT TAAATTATTC 
GCTGGCGTAA TAGCGAAGAG GCCCGCACCG 
ATGGCGAAT6 GCGCTHGCC TGGTTTCCGG 
AGTGCGATCT TCCTGAGGCC GATACGGTCG 



TTTTTAATGG CGATGTnTA 1980 
AAATATTGTC T6TGCCACGT 5010 
GCCAGAATGT CCCniTAn 5100 
CATHCAGAC GATTGAGC6T 5160 
TGGCTGGCGG TAATAnGTT 5220 
CTCAGGCAA6 TGATGHAn 5280 
ATGGACAGAC TCTTTTACTC 5310 
GCGTACCGTT CCTGTCTAAA 5100 
CCAACGAGGA AAGCACGHA 5160 
6GCGCATTAA GCGCGGCGGG 5520 
GCCCTAGCGC CCGCTCCHT 5580 
CCCCGTCAAG CTCTAAATCG 5610 
CTCGACCCCA AAAAACTTGA 5700 
X GCCCTTTGftC 5750 
CACTCAACCC 5820 
CACCATCAAA 5880 
CTCTCAGGGC 5910 
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GGGCTATGGT 
CGAGCAAGGC 
CCAACAGTTG 
GGTGCCGGAA 
AAACTGGCAG 
GGTCAATCCG 
TGTTGATGAA 



ACGATGCGCC CATCTACACC AACGTAACCT 
CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA 
AGGAAG6CCA GACGCGAATT ATTTnGATG GCGHCCTAT TGGTTAAAAA 
TTAACAAAAA HTAACGCGA ATnTAACAA AATATTAACG THACAAnT 
TTATACAATC TTCCTGTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC 
CATGCTAGH nACGATTAC CGTTCATCGA nCTCnGTT TGCTCCAGAC 
TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGCA 
AGCTAGAACG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC 
niTGAATCT HACCTACAC ATTACTCAG6 CATTGCATTT AAAATATATG 

. - .- AA ATTTT TAT CCnGCGTTG AAATAAAGGC HCTCCCGCA AAAGTATTAC 

7201 TGTTTTTGGT ACAACCGATT TAGCTHATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 
7261 TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 
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TGCAHCAAT 6300 
AGTAGTTATA 6360 
TTCTTAACCA 6120 
CGCAGCCTGA 6180 
AGCTGGCTGG 6510 
ATGCACGGH 6500 
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AGCTGGCTAC 6720 
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1 AATGCTACTA CTAHAGTAG AATTGATGCC ACCniTCAG CTCGCGCCCC AAATGAAAAT 60 
51 ATAGCTAAAC A6GTTATTGA CCATHGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTC6CA6A ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACHTA 180 
181 GnGCATATT TAAAACATGT TGAGCTACAG CACCAGAHC AGCAATTAAG CTCTAAGCCA 240 
241 TCTGCAAAAA TGACCTCHA TCAAAAG6AG CAAHAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TIGGAGHTG CTTCCGGTCT GGnCGCTTT 6AAGCTCGAA TTAAAACGCG ATATHGAAG 360 
351 TCTTTC6G6C TTCCTCHAA TCniTTGAT GCAATCC6CT TTCGTTCTGA CTATAATAGT 420 
421 CA6GGTAAAG ACCTGATTH TGATTTATGG TCAHCTCGT TTTCT6AACT 6TTTAAAGCA 480 



481 

60 
66 



721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGnAGTTC GTTTTAnAA CGTAGATTH 780 



78 



TTTGA66G6G ATTCAATGAA TATTTAT6AC GAHCCGCAG TAHGGACGC TATCCAGTCT 540 
AAACATnTA CTATTACCCC CTCT6GCAAA ACnCTTTTG CAAAAGCCTC TC6CTATTTT 600 
GGTTTrTATC GTCGTCTGGT AAACGAGG6T TATGATAGTG TTGCTCnAC TATGCCTCGT 660 

AAnccTrrr ggcgttatgt atctgcatta gttgaatgtg gtahcctaa atctcaactg 720 



TCnCCCAAC GTCCTGACTG GTATAATGAG CCAGITCHA AAATCGCATA AGGTAATTCA 840 



8^H CAATGAHAA A6TTGAAATT AAACCATCTC AAGCCCAAH TACTACTCGT TCTGGTGTH 900 
901 CTCGTCAGGG CAA6CCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT HGGGTAATG 950 
961 AATATCC6GT TCTTGTCAA6 ATTACTCTT6 ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG nGGTCAGTT CGGTTCCCn AT6ATTGACC 1080 
1081 GTCT6CGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAAHTAT 1140 
1141 CAGGCGATGA TACAAATCTC CGnGTACTT TGHTCGCGC TTG6TATAAT CGCTG6GG6T 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTHCGT TTTAGGTTGG TGCCTTCGTA 1250 
1261 GTGGCATTAC 6TATTTTACC CGHTAAIGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAA6CCTCT GTA6CCGTTG CTACCCTCGT TCC6ATGCTG TCTTTCGCT6 CTGAGGGTGA 1380 
1381 CGATCCC6CA AAA6CGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA 1440 
1441 TGCGT6GGCG AIGGHGHG TCAHGTCGG CGCAACTATC 6GTATCAAGC TGTTTAAGAA 1500 
1501 AHCACCTCG AAAGCAAGCT GATAAACCGA TACAAHAAA GGCTCCnn G6AGCCTTTT 1560 
1561 TTTTT66AGA niTCAACGT GAAAAAATTA nATTCGCAA TTCCTTTAGT TGHCCTnC 1520 
1621 TATTCTCACT CCGCTGAAAC TGHGAAAGT TGHTAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 TTTACTAACG TCT6GAAAGA GCACAAAACT TTAGATCGH ACGCTAACTA TGAGGGTTGT 1740 
1741 CTGTGGAAT6 CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCA6TG TTACGGTACA 1800 
1801 TGGGnCCTA HGGGCnGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGT6GCGGT 1860 
1861 TCTGA6G6TG GCGGHCTGA G6GTGGCGGT ACTAAACCTC CTGAGTACGG T6ATACACCT 1920 
1921 AHCCGGGCT ATACHATAT CAACCCTCTC GACGGCACH ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCHAATAC TTTCATGTrT 2040 
2041 CAGAATAATA GGHCCGAAA TA6GCAGGG6 GCATTAACTG THATACGGG CACTGHACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAA6CCATG 2160 
2161 TATGACGCTT ACT6GAACGG TAAAHCAGA GACTGCGCTT TCCATTCTGG CHTAATGAA 2220 
2221 GATCCAHCG HTGIGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGC66CG GCTCTG6TGG TGGnCTGGT GGCGGCTCTG AGGGTG6TGG CTCTGA6GGT 2340 
2341 GGCGGHCTG AGG6TGGCGG CTCT6AGG6A GGCGGTTCCG GTGGTGGCTC IGGHCCGGT 2400 
2401 GATHTGATT ATGAAAAGAT GGCAAACGCT AATAA6GG6G CTATGACC6A AAAT6CC6AT 2460 
2461 GAAAAC6CGC TACAGTCTGA CGCTAAA6GC AAACHGAH CTGTCGCTAC T6ATTACG6T 2520 
2521 GCTGCTATCG AIGGHTCAT TGGTGACGH TCCGGCCHG CTAATG6TAA TGGTGCTACT 2580 
2581 G6TGA1TTTG CTGGCTCTAA TTCCCAAAT6 GCTCAAGTCG GTGACGGT6A TAAHCACCT 2640 
2641 TTAATGAATA ATHCCGTCA ATAmACCT TCCCTCCCTC AATCGGTTGA ATGTC6CCCT 2700 
2701 TTTGTCTTTA GCGCTGGTAA ACCATAT6AA TTTTCTAnG ATTGT6ACAA AATAAACTTA 2760 
2761 nCCGTGGTG TCnTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGnCTTiTG GGTAHCCGT 2880 
2881 TATTAHGCG TTTCCTCGGT nCCTTCTGG TAACHTGIT CGGCTATCTG CnACTTTTC 2940 
2941 TTAAAAAGGG CTTCGGTAAG ATAGCTAHG CTATHCATT GITTCHGCT CTTAHAnG 3000 
3001 GGCHAACTC AAHCTTGIG GGHATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3050 
3051 HGHCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTnT TATGnATTC 3120 
3121 TCTCTGTAAA GGCTGCTATT nCATTTTTG ACGTTAAACA AAAAATCGH TCTTATnGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTnATTTT GTAACTGGCA' AATTA6GCTC TGGAAAGACG 3240 
3241 CTCGTTAGCG TT6GTAAGAT TTAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCHCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3350 
3351 CTTAGAATAC CGGATAAGCC TTCTATATCT GATTT6CTTG CTAHGGGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCHGCn GHCTCGATG AGTGCGGTAC nGGlTTAAT 3480 
3481 ACCCGTTCTT GGAAT6ATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTC6T 3540 
3541 AAATTAGGAT GGGATATTAT TTTTCnGTT CAGGACHAT -CTATTGHGA TAAACAGGCG 3500 
3501 CGHCTGCAT TAGCTGAACA TGTTGnTAT T6TCGTC6TC TGGACAGAAT TACTTTACCT 3560 
3551 TTTGTC66TA CHTATATTC TCTTATTACT 6GCTCGAAAA TGCCTCTGCC TAAAHACAT 3720 
3721 GnGGCGHG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGA6C6 TTGGCTTTAT 3780 
3781 ACTG6TAAGA ATTT6TATAA C6CATAT6AT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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3841 TCCGGTGTTT AnCTTATTT AACGCCTTAT TTATCACACG GTCGGTATH CAAACCAHA 3900 
3901 AArrTAGGTC AGAAGATGAA GCHACTAAA ATATATTTGA AAAA6TTTTC ACGC6TTCTT 3950 
3951 TCrCTTGCGA TTGGATTT6C ATCAGCATTT ACATATAGH ATATAACCCA ACCTAA6CCG A020 
1021 GftGGTTAAAA AGGTAGTCTC TCAGACCTAT GATHTGATA AATTCACTAT TGACTCTTCT 4080 
1081 CAGCGTCHA ATCTAA6CTA TCGCTATGH TTCAAGGATT CTAAGG6AAA ATTAATTAAT 4110 
1111 AGCGACGAH TACAGAAGCA AGGTTAHCA CTCACATATA TTGATTTATG TACTGTTTCC 1200 
1201 ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTn TCHGATGIT 1260 
1261 TGrrrCATCA TCnCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATH 1320 
1321 TGTAACTT6G TATTCAAAGC AATCAG6CGA ATCCGTTATT GTITCTCCCG ATGTAAAA6G 1380 
1381 TACTGHACT GTATATTCAT CT6ACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 1110 
4111 TGTTrTACGT GCTAATAAH TT6ATATGGT TGGTTCAAn CCTTCCATAA HCAGAAGTA 1500 
1501 TAATCCAAAC AATCAGGAH ATAHGATGA ATT6CCATCA TCT6ATAATC AGGAATATGA 4560 
1551 TGATAAHCC GCTCCTTCTG GTGGmCTT TGTTCCGCAA AATGATAATG HACTCAAAC 4520 
1621 TTTTAAAAn AATAACGHC GG6CAAAGGA HTAATACGA GTTGTC6AAT TGITTGTAAA 4580 
1681 GTCTAATACT TCTAAATCCT CAAATGTATT ATCTAHGAC GGCTCTAATC TAHAGnGT 1710 
1711 TAGT6CACCT AAAGATATH TAGATAACCT TCCTCAAHC CTTTCTACTG TTGATTTGCC 1800 
1801 AACT6ACCAG ATAnGATTG AGGGTTTGAT ATTTGAGGn CA6CAA6GTG ATGCHTAGA 1860 
1851 TiTTTCAlTT GCTGCTGGCT CTCAGCGTGG CACTGHGCA GGCGGTGTTA ATACTGACCG 4920 
1921 CCTCACCTCT GTTTTATCn CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGirTT 4980 
1981 AGGGaATCA GHCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 
5011 TAHCTTACG CTTTCAGGTC AGAAGGGHC TATCTCTGH GGCCAGAATG TCCCTTTTAT 5100 
5101 TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCAHTCAGA CGATTGAGCG 5160 
5151 TCAAAATGTA GGTATTTCCA TGAGCGTm TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 
5221 TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGHAT 5280 
5281 TACTAATCAA AGAAGTATTG CTACAACGGT TAATHGCGT GAT6GACAGA CTCTTTTACT 5310 
5311 CGGTGGCCTC ACTGATTATA AAAACACHC TCAAGATTCT 6GCGTACCGT TCCT6TCTAA 5100 
AAKCCTTTA ATC6GCCTCC TGTTTAGCTC CCGCTCTGAT TCCAAC6A6G AAAGCACGTT 5160 
5^61 ATAC6TGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGC6GCG6 5520 
5521 GTGT6GT66T TACGCGCAGC 6TGACCGCTA CACTT6CCA6 CGCCCTAGCG CCCGCTCCH 5580 
5581 TCGCTTTCTT CCCTTCCTTT CTCGCCAC6T TCGCCGGCH TCCCCGTCAA 6CTCTAAATC 5610 
6GGGGCTCCC TTTAGGGnC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTT6 5700 
ATTTGGGTGA T6GTTCACGT AGTG6GCCAT CGCCCTGATA GACGGTrTTT CGCCCTTTGA 5750 
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CGTT6GA6TC CAC6T TCTTT AATA6T66AC TCHGnCCA AACTGGAACA ACACTCAACC 5820 
CTATCTCGGG CTATTCTTTT 6ATTTATAAG GGATTTT6CC GATHCGGAA CCACCATCAA 5880 
ACAGGATTTT CGCaGCTGG G6CAAACCAG CGTG6ACCGC HGCTGCAAC TCTCTCAG6G 5910 
CCAG6CGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 5000 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 5060 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 5120 
6121 TCACTCAHA GGCACCCCAG GCTTTACACT TTAT6CTTCC GGCTCGTATG TTGTGTGGAA 5180 
6181 nCTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT G6CCGTCGTT TTACAACGTC 5210 
6211 GTGACTGGGA AAACCCTGGC GHACCCAAG CTTT6TACAT GGAGAAAATA AAGTGAAACA 5300 
6301 AAGCACTAn GCACTGGCAC TCTTACCGH ACTGTTTACC CCTGTGGCAA AAGCCCTTCT 6350 
6361 GA6GCATCCG G6A6CTGAAG GCGATGACCC TGCTAAGGCT GCAHCAATA GTTTACAG6C 5120 
6421 AA6T6CTACT GAGTACATTG GCTAC6CTTG G6CTATG6TA GTA6TTATAG TTGGT6CTAC 5480 
6181 CATA666ATT AAATTATTCA AAAAGTrTAC GAGCAAGGCT TCTTAAGCAA TAGCGAAGAG 5510 
6511 GCCCGCACCG ATCGCCCHC CCAACAGTT6 CGCAGCCTGA AT66CGAATG GCGCTTTGCC 6600 
6501 T66TTTCCGG CACCAGAA6C GGTGCC6GAA AGCTGGCTGG AGT6CGATCT TCCTGA6GCC 5660 
6561 GATACGGTC6 TCGTCCCCTC AAACTGGCAG ATGCACG6TT ACGATGCGCC CATCTACACC 5720 
6721 AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC CCACGGAGAA TCCGACG6GT 5780 
6781 TGHACTCGC TCACATTTAA TGTT6ATGAA AGCT6GCTAC AGGAA6GCCA 6ACGCGAATT 6810 
6811 ATTTTIGATG 6CGTTCCTAT TGGTTAAAAA ATGAGCTGAT TTAACAAAAA THAACGCGA 6900 
6901 ATTTTAACAA AATATTAACG TnACAATn AAATATHGC HATACAATC nCCTGTTTT 6960 
6961 TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGAHGA CATGCTAGTT TTACGATTAC 7020 
7021 CGHCATCGA nCTCTTGTT TGCTCCAGAC TCTCAGGCAA TGACCTGATA GCCTTTGTAG 7080 
7081 ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAA1TTATC AGCTAGAACG GHGAATATC 7110 
7111 ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC nHGAATCT TTACCTACAC 7200 
7201 AHACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA AAATTTTTAT CCTTGCGTTG 7260 
7261 AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA TGTTTTTGGT ACAACCGAH 7320 
7321 TAGCTTTATG CTCTGAGGCT TTAnGCTTA ATTTTGCTAA nCTTTGCCT TGCCTGTATG 7380 
7381 AnTATTGGA CGTT . 7394 
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